Hiroyuki Shima 
Tsuneyoshi Nakayama 


(emma ERO 


Higher Mathematics for Physics 
and Engineering 


Hiroyuki Shima - Tsuneyoshi Nakayama 


Higher Mathematics for 
Physics and Engineering 


Q) Springer 


Dr. Hiroyuki Shima, Assistant Professor Dr. Tsuneyoshi Nakayama, Professor 


Department of Applied Physics Toyota Physical and Chemical Research Institute 
Hokkaido University Aichi 480-1192, Japan 

Sapporo 060-8628, Japan Riken-nakayama@ mosk.tytlabs.co.jp 

shima @ eng. hokudai.ac.jp 

ISBN 978-3-540-87863-6 e-ISBN 978-3-540-87864-3 


DOI 10.1007/b138494 
Springer Heidelberg Dordrecht London New York 


Library of Congress Control Number: 2009940406 


© Springer-Verlag Berlin Heidelberg 2010 

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is 
concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, 
reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publication 
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 
1965, in its current version, and permission for use must always be obtained from Springer. Violations 
are liable to prosecution under the German Copyright Law. 

The use of general descriptive names, registered names, trademarks, etc. in this publication does not 
imply, even in the absence of a specific statement, that such names are exempt from the relevant protective 
laws and regulations and therefore free for general use. 


Cover design: eStudio Calamar Steinen 
Printed on acid-free paper 


Springer is part of Springer Science+Business Media (www.springer.com) 


To our friends and colleagues 


Preface 


Owing to the rapid advances in the physical sciences and engineering, the de- 
mand for higher-level mathematics is increasing yearly. This book is designed 
for advanced undergraduates and graduate students who are interested in the 
mathematical aspects of their own fields of study. The reader is assumed to 
have a knowledge of undergraduate-level calculus and linear algebra. 

There are any number of books available on mathematics for physics and 
engineering but they all fall into one of two categories: the one emphasizes 
mathematical rigor and the exposition of definitions or theorems, whereas the 
other is concerned primarily with applying mathematics to practical prob- 
lems. We believe that neither of these approaches alone is particularly helpful 
to physicists and engineers who want to understand the mathematical back- 
ground of the subjects with which they are concerned. This book is different 
in that it provides a short path to higher mathematics via a combination of 
these approaches. A sizable portion of this book is devoted to theorems and 
definitions with their proofs, and we are convinced that the study of these 
proofs, which range from trivial to difficult, is useful for a grasp of the general 
idea of mathematical logic. Moreover, several problems have been included at 
the end of each section, and complete solutions for all of them are presented 
in the greatest possible detail. We firmly believe that ours is a better peda- 
gogical approach than that found in typical textbooks, where there are many 
well-polished problems but no solutions. 

This book is essentially self-contained and assumes only standard under- 
graduate preparation such as elementary calculus and linear algebra. The 
first half of the book covers the following three topics: real analysis, func- 
tional analysis, and complex analysis, along with the preliminaries and four 
appendixes. Part I focuses on sequences and series of real numbers of real 
functions, with detailed explanations of their convergence properties. We also 
emphasize the concepts of Cauchy sequences and the Cauchy criterion that 
determine the convergence of infinite real sequences. Part II deals with the 
theory of the Hilbert space, which is the most important class of infinite vec- 
tor spaces. The completeness property of Hilbert spaces allows one to develop 


Vill Preface 


various types of complex orthonormal polynomials, as described in the mid- 
dle of Part II. An introduction to the Lebesgue integration theory, a subject 
of ever-increasing importance in physics, is also presented. Part III describes 
the theory of complex-valued functions of one complex variable. All relevant 
elements including analytic functions, singularity, residue, continuation, and 
conformal mapping are described in a self-contained manner. A thorough un- 
derstanding of the fundamentals treated is important in order to proceed to 
more advanced branches of mathematical physics. 

In the second half of the volume, the following three specific topics are 
discussed: Fourier analysis, differential equations, and tensor analysis. These 
three are the most important subjects in both engineering and the physical 
sciences, but their rigorous mathematical structures have hardly been covered 
in ordinary textbooks. We know that mathematical rigor is often unnecessary 
for practical use. However, the blind usage of mathematical methods as a tool 
may lead to a lack of understanding of the symbiotic relationship between 
mathematics and the physical sciences. We believe that readers who study 
the mathematical structures underlying these three subjects in detail will ac- 
quire a better understanding of the theoretical backgrounds associated with 
their own fields. Part IV describes the theory of Fourier series, the Fourier 
transform, and the Laplace transform, with a special emphasis on the proofs 
of their convergence properties. A more contemporary subject, the wavelet 
transform, is also described toward the end of Part IV. Part V deals with or- 
dinary and partial differential equations. The existence theorem and stability 
theory for solutions, which serve as the underlying basis for differential equa- 
tions, are described with rigorous proofs. Part VI is devoted to the calculus of 
tensors in terms of both Cartesian and non-Cartesian coordinates, along with 
the essentials of differential geometry. An alternative tensor theory expressed 
in terms of abstract vector spaces is developed toward the end of Part VI. 

The authors hope and trust that this book will serve as an introductory 
guide for the mathematical aspects of the important topics in the physical 
sciences and engineering. 


Sapporo, Hiroyuki Shima 
November 2009 Tsuneyoshi Nakayama 


Contents 


1: - (Preliminaries? 545-5 2.4 bo ek, Wis eh 2 end Pe eh ee 1 
1,1. ° “Basic: Notions:.OF 8: S60 het a eho kph ecg he nl a wee Rae 1 
1.1.1 Set and Element ........... 0.00000 ccc ccs 1 
A 2: Number S 6tsiioi.4 i Aleit kid gh dah ett hack 3 
Te3 “Bound $ii2 2-2 eecce eee ie i ata ales a te ae ga a tee ae 3 
Peds, CDber vale otis S.32 aves a eodsas Mecaaa ectat uae acdt uaneonk Both vpadams 4 

1.1.5 Neighborhood and Contact Point.................... 5 
1.1.6 Closed and Open Sets.......... 0.0... eee eee eee eee 7 

1.2 Conditional Statements ........0.0 00000 een ee 9 
1.3. <Order*of Magnitude c sevccsi.i ting aus dee hes ed es ee 10 
1.3.1 Symbols O, 0, and ~...... 00... 0c cee eee ee 10 
1.3.2 Asymptotic Behavior .......... 0.0... cee eee eee eee 11 

1.4 Values of Indeterminate Forms ............0.0.000 000 cece ee 12 
1:41 PHopital’s: Rule acto. nec een ehh ea cad ee 3G 12 
1.4.2 Several Examples............. 2.0... cece eee ee eee 13 


Part I Real Analysis 


2 Real Sequences and Series .............. 00.0.0 cece eee eee 17 
2.1 Sequences of Real Numbers ........... 0.20... 0 cence eee ee 17 
2.1.1 Convergence of a Sequence ............. 0... cee eee 17 
2.1.2 Bounded Sequences........... 0.00 cece eee eee eee eee 18 
2.1.3 Monotonic Sequences ......... 0.00.0 eee eee eee eee 19 
2.1.4 Limit Superior and Limit Inferior.................... 21 

2.2 Cauchy Criterion for Real Sequences ..............0-0000 00% 25 
2:2.1, Cauchy Sequences.) .804 6 ini Se ees 25 
2.2.2: Catichy Criterion i. cited ei yee ae 26 

2.3 Infinite Series of Real Numbers.................0.0 00000 eee 29 
2.3.1 Limits of Infinite Series ......... 00.0.0... ee eee eee 29 


2.3.2 Cauchy Criterion for Infinite Series .................. 3l 


x Contents 

2.3.3 Absolute and Conditional Convergence............... 32 
2:3:4 Rearrangements: <.400c:.000..4 00. eee ekias eds 34 

2.4 Convergence Tests for Infinite Real Series................... 38 
2Acl | Limit Tests! viewed earn te hts Ba Oe ee 38 
2:42 Ratio: Tests. iia oh aap liad eae ales dee weed 40 
DAS! ROO RESUS scent tier if aaa ae atk ain he aM ae Al 
2.4.4 Alternating Series Test........ 0.0... cece eee eee eee 42 

3 Real: Functions: 6.340 nana td ia aia eee eh ks Meenas 3 45 
3.1 Fundamental Properties ............. 0.0 c eee ee eee 45 
3.1.1 Limit of a Function.......... 0.0.0... cece eee eee 45 
3.1.2. Continuity of a Function ........ 0.0.0.0... 00. 47 
3.1.3 Derivative of a Function............. 00... eee ee eee 48 
3.1.4, “Smooth Punctions: sa .ws sili hed Pa iok as 50 

3.2 Sequences of Real Functions ............. 0.00.0 c eee eee eee 50 
3.2.1 Pointwise Convergence ............. 20. e cece eee e eee 50 
3.2.2. Uniform Convergence ......... 00... 52 
3.2.3 Cauchy Criterion for Series of Functions.............. 53 
3.2.4 Continuity of the Limit Function .................... 54 
3.2.5 Integrability of the Limit Function................... 56 
3.2.6 Differentiability of the Limit Function................ 57 

3.3 Series of Real Functions .......... 00... cece cee eee ee eee 61 
3.3.1 Series of Functions ....... 0.0... cece eee eee eee 61 
3.3.2 Properties of Uniformly Convergent Series of Functions 62 
3.3.3 Weierstrass M-test ........ 000.000 63 

3.4 Improper Integrals........... 0.0.0. ee 66 
34a, -Dennitionss. sil Seis lh wate his eM hee baltellth ioeenat 66 

3.4.2 Convergence of an Improper Integral................. 67 
3.4.3 Principal Value Integral ............ 0.00.00... 00000, 67 
3.4.4 Conditions for Convergence ............. 0.0.00 68 


Part II Functional Analysis 


4; “HilbertSpaces): 13.09.2444) 44004 400d 1 feen tote ave bade aad 73 
4.1 Hilbert Spaces. <is.c0¢.4 dees ee ian eee ees waa ees 73 
AMT, Untroductionn: i314 eG tale ind a keloncie ak haa oe os 73 

4.1.2 Abstract Vector Spaces ...........0. cece eee ee eee 74 

4.103. Inner Product). 46..8ciie Sainte caine tga damn eee iota 75 

4.1.4 Geometry of Inner Product Spaces .................. 76 

AwWb-. ‘Orthogonality:. states ein ak at, edhe UAE Bah ok 78 

4.1.6 Completeness of Vector Spaces .............0.0000 ee 79 

4.1.7 Several Examples of Hilbert Spaces .................. 80 

4.2 Hierarchical Structure of Vector Spaces..............0.0005- 83 


4.2.1 Precise Definitions of Vector Spaces................4- 83 


Contents XI 


ADD. -Metric*S paces e-fe 8b ae eee ee ee eS PE 84 
4.2.3 Norivied (SpaGes: we.4 6 eed warded ete Re Gaech og Gtk oe 85 
4.2.4 Subspaces of a Normed Space................02-0 00% 86 
4.2.5 Basis of a Vector Space: Revisited ................... 87 
4.2.6 Orthogonal Bases in Hilbert Spaces.................. 88 
4.3 Hilbert Spaces of 2? and L? ... 1... eee ees 91 
4.3.1 Completeness of the 0? Spaces ...........00 0c cece 91 
4.3.2 Completeness of the L? Spaces ...........000 0. eeu 92 
4.3.3 Mean Convergence ........... 0. cece cee cece eee eens 95 
4.3.4 Generalized Fourier Coefficients ..................... 95 
4.3.5 Riesz-Fisher Theorem ............ 0.0000: c eens 96 
4.3.6 Isomorphism between @? and L?...........000. 00a 98 
Orthonormal Polynomials................... 0.0.0. e eee eee 101 
5.1 Polynomial Approximations...............00.0 000 e eee eee 101 
5.1.1 Weierstrass Theorem............ 00.0.0. cece eee eee 101 
5.1.2 Existence of Complete Orthonormal sets of Polynomials 103 
5.1.3 Legendre Polynomials............... 0.0... eee eee 105 
beled | (Fourier: Series: ie:5;. ene Og 4S es Vie eh roe ee 108 
5.1.5 Spherical Harmonic Functions......................- 109 
5.2 Classification of Orthonormal Functions .................... 114 
5.2.1 General Rodrigues Formula...............0... 000005 114 
5.2.2 Classification of the Polynomials .................... 116 
5.2.3 The Recurrence Formula ............... 00.0000 ee 119 
5.2.4 Coefficients of the Recurrence Formula............... 120 
5.2.5 Roots of Orthogonal Polynomials.................... 121 
5.2.6 Differential Equations Satisfied by the Polynomials ....122 
5.2.7 Generating Functions (I) ..................00000 000. 124 
5.2.8 Generating Functions (II) ...................000000. 125 
5.3 Chebyshev Polynomials .......... 0.0.00... eee eee ee eee 128 
D.d-L -Minimax Property. ic... s0 sie as cece ad ee eas 128 
5.3.2 A Concise Representation .............0..0 0c eee 131 
5.3.3 Discrete Orthogonality Relation ..................... 133 
5.4 Applications in Physics and Engineering.................... 135 
5.4.1 Quantum-Mechanical State in an Harmonic Potential .. 135 
5.4.2 Electrostatic potential generated by a multipole ....... 136 
Lebesgue Integrals .............. 0... cece eee 139 
6.1 Measure and Summability ............. 0.0... 139 
6.1.1 Riemann Integral Revisited ..................0..00.. 139 
Orla Measure ee eietei kn datl ied eee de tone ne ee 141 
6.1.3 The Probability Measure .......................0000. 142 
6.1.4 Support and Area of a Step Function ................ 144 
6125 ee Stimma bility snes dies sehr bh Rane oe aan 146 


6.1.6 Properties of a-summable functions.................. 147 


XII 


6.2 


6.3 


6.4 


6.5 


Contents 
Lebesgue Integral... ...... 0... cece eee eee eee eee eens 149 
6.2.1 Lebesgue Measure .............. ccc eee eee eee 149 
6.2.2 Definition of the Lebesgue Integral .................. 151 
6.2.3 Riemann Integrals vs. Lebesgue Integrals ............. 152 
6.2.4 Properties of the Lebesgue Integrals ................. 153 
6.2.5 Null-Measure Property of Countable Sets............. 154 
6.2.6 The Concept of Almost Everywhere ................. 155 
Important Theorems for Lebesgue Integrals ................. 158 
6.3.1 Monotone Convergence Theorem .................0.. 158 
6.3.2 Dominated Convergence Theorem (I) ................ 160 
6:3:3: Patou Lemma wii. obo. ieee teckel dean Ged: 160 
6.3.4 Dominated Convergence Theorem (II) ............... 161 
6:3:5° <Fubini- Theorem .@ oi. ahi i ataxedaed gadis ote gel ghee 162 
The Lebesgue Spaces LP .... eee eee eee ee 167 
6.4.1 The Spaces of LP... 0. eee eens 167 
6.4.2 Holder Inequality.......... 0... cece eee eee eee 168 
6.4.3 Minkowski Inequality ........... 00.0... 0. eee eee eee 169 
6.4.4 Completeness of L? Spaces ............0 00 e eee eee 170 
Applications in Physics and Engineering.................... 172 
6.5.1 Practical Significance of Lebesgue Integrals ........... 172 
6.5.2 Contraction Mapping .............. 0.00. c eee eee eee 173 
6.5.3 Preliminaries for the Central Limit Theorem.......... 175 
6.5.4 Central Limit Theorem ................ 000.000.0000. 177 
6.5.5 Proof of the Central Limit Theorem ................. 178 


Part III Complex Analysis 


7 Complex Functions ............ 0.0... cece eee 185 
(els Aiialytic-PUmetions? ...222ics atecta sme ee Bane ae ek ase eriaet ane 185 
7.1.1 Continuity and Differentiability ..................0.. 185 
7.1.2 Definition of an Analytic Function................... 187 
7.1.3 Cauchy—Riemann Equations ................ 00.2000. 189 
7.1.4 Harmonic Functions ..........0. 00.0000... cece eee 191 
7.1.5 Geometric Interpretation of Analyticity .............. 192 

7.2 Complex Integrations ......... 0... cee cee cee ee cee nee 195 
7.2.1 Integration of Complex Functions ................... 195 
7.2.2 Cauchy Theorem ............. 0... cece eee eens 197 
7.2.3 Integrations on a Multiply Connected Region ......... 199 
7.24 Primitive Functions........... 00.0... ee eee 201 

7.3 Cauchy Integral Formula and Related Theorem ............. 204 
7.3.1 Cauchy Integral Formula ............ 0.0.00... 0200.. 204 
ded: « Goursat: Forma: 60.4 Sagiohentate ire sg ath a Sore a as, + 206 
7.3.3 Absence of Extrema in Analytic Regions ............. 207 


7.3.4 Liouville Theorem ......... 0.0.0... ccc cee eee 208 


Contents XIII 


7.3.5 Fundamental Theorem of Algebra ................0.. 209 
7.3.6 Morera Theorem .......... 00.0.0 eee eee eee eee eens 210 

7.4 Series Representations ............. 00.0 cece eee eee eee ee 213 
7.4.1 Circle of Convergence ......... 0.0... cee eee ee eee 213 
7.4.2 Singularity on the Radius of Convergence............. 215 
CAS: Taylor Series wwii stated eke ea ee ens eee 217 
7.4.4 Apparent Paradoxes .............00 cee eee eee eee 218 
VA5- Laurent Series: 2. coccaea cee Qed eta ee ew eas 219 
7.4.6 Regular and Principal Parts ....................000.. 221 
7.4.7 Uniqueness of Laurent Series................0 000000 222 
7.4.8 Techniques for Laurent Expansion ................... 223 

7.5 Applications in Physics and Engineering.................... 228 
TO:L Fluid: Dynamics! cei sia aid oe ieee eek i se 228 
7.5.2 Kutta-Joukowski Theorem .................00000 00 229 
7(25-3°-- Blasius: Formulass.ccs oi-e53 8 & dob secede gee Rhea added alas 231 

8 Singularity and Continuation ........................000005 233 
Sel, (SINGUlALIEY, ste Aki es ohh Bs ak tet ea BP hale bald ot Wiens 233 
8.1.1 Isolated Singularities........... 0.00. eee eee eee, 233 
8.1.2 Nonisolated Singularities ............ 0.0.0.0... 2000. 235 
8.1.3 Weierstrass Theorem for Essential Singularities........ 236 
8.1.4 Rational Functions ........... 00.0... eee 237 

82. -Miultivaltedneéss: o<.3c sce ed oe dole ae age hag ee Pe 240 
8.2.1 Multivalued Functions .............. 000. c eee eee eee 240 
8.2.2, Riemann Surfaces 2.00.24 ¢.d4a4 ean os tt deden ee ges 241 
8.2.3. Branch Point and Branch Cut ...................... 243 

8.3 Analytic Continuation ........ 0... eee eee eee 245 
8.3.1 Continuation by Taylor Series.................. 2004. 245 
8.3.2 Function Elements............. 2.0.0 e cece eee ee ee 246 
8.3.3 Uniqueness Theorem.......... 0.00.0. c eee eee eee eee 250 
8.3.4 Conservation of Functional Equations ................ 250 
8.3.5 Continuation Around a Branch Point ................ 252 
8.3.6 Natural Boundaries.............. 0.0.00 cece eee 252 
8.3.7 Technique of Analytic Continuations................. 254 
8.3.8 The Method of Moment ................ 00.00. e ee eee 255 

9 Contour Integrals............ 0.0... eens 259 
9.1, -Caletilus‘of Residues ....:csnj.nciach citer ea Ana ee en lees 259 
9.1.1 Residue Theorem............. 0.00 cece eee eee eee 259 
9.1.2 Remarks on Residues ........... 0.000. c eee ee eee eee 261 
9.1.3 Winding Number............ 0... cee eee eee eee eee 262 
9.1.4. Ratio. Method: ss oe enn at enh endian oa boad Glade 263 
9.1.5 Evaluating the Residues................ 0.0.00 000005 264 

9.2 Applications to Real Integrals.............0 0.00.0 e eee eee 267 


9.2.1 Classification of Evaluable Real Integrals ............. 267 


XIV 


10 


Contents 
9.2.2 Type 1: Integrals of f(cos@,sinO)...............2200. 268 
9.2.3 Type 2: Integrals of Rational Function ............... 268 
0.24 Types: Intesralssof fines. ian Dadas vou es eedenweses 270 
9.2.5 Type 4: Integrals of f(ar)/a? ... 0. eee 271 
9.2.6 Type 5: Integrals of f(x) loga@...... 0... eee eee eee 273 
9.3. More Applications of Residue Calculus ............. 000.0005 277 
9.3.1 Integrals on Rectangular Contours................... 277 
9.3.2 Fresnel Integrals ).00 0. cc nad tad we ee 279 
9.3.3 Summation of Series .......... 0.00. eee ee 281 
9.3.4 Langevin and Riemann zeta Functions ............... 283 
9.4 Argument: Principle; ..40464 sage etek tial dedachoda eds 285 
O:Ac1. 5 “The Principles. .4:rased cite gd baat e tba daa eis NG 285 
9.4.2 Variation of the Argument.................. 0.000 eee 288 
9.4.3 Extentson of the Argument Principle ................ 289 
9.4.4 Rouché Theorem ............. 0... c eee eee eee eee 290 
9.5 Dispersion Relations ............ 00... 00 cee eee eee ee eee 293 
9.5.1 Principal Value Integrals ..........00... 00000002000. 293 
9.5.2 Several Remarks ............. 0.00 cece eee ee eee eee 295 
9.5.3 Dispersion relations............ 0.000 c ee eee eee ee 297 
9.5.4 Kramers—Kronig Relations................0. 0002000. 298 
9.5.5 Subtracted Dispersion Relation ..................... 299 
9.5.6 Derivation of Dispersion Relations................... 300 
Conformal Mapping ............... 0... cece eee eee 305 
10:1, Fundamentals: 4.2.3.004 2.0 nae oie Lede oto nep tend dP dedanne’s Ms 305 
10.1.1 Conformal Property of Analytic Functions............ 305 
1021.2) Scale BaACtOrid 26.8 eA ear Sah SN dee oi 307 
10.1.3 Mapping of a Differential Area ..................0-.. 308 
10.1.4 Mapping of a Tangent Line ....................0004. 309 
10.1.5 The Point at Infinity....................2 20.00.0005. 311 
10.1.6 Singular Point at Infinity.......................000, 312 
10.2 Elementary Transformations ........... 0.0.0.0... cee eee 315 
10.2.1 Linear Transformations ..................00 eee eee 315 
10.2.2 Bilinear Transformations ..................0 0020s eee 316 
10.2.3 Miscellaneous Transformations ..................004- 317 
10.2.4 Mapping of Finite-Radius Circle..................... 321 
10.2.5 Invariance of the Cross ratio ...............0 00 e eee 322 
10.3 Applications to Boundary-Value Problems .................- 325 
10.3.1 Schwarz—Christoffel Transformation.................. 325 
10.3.2 Derivation of the Schwartz—Christoffel Transformation . 326 
10.3.3 The Method of Inversion ................ 00000200 e ee 327 
10.4 Applications in Physics and Engineering.................... 332 


10.4.1 Electric Potential Field in a Complicated Geometry... . 332 
10.4.2 Joukowsky Airfoil 2.0.00... 0... eee eee 335 


Contents XV 


Part IV Fourier Analysis 


11° Fourier Series: {. 2226s Getigaced pints Se nettigate eae dea kehaes 339 
Ti Basic Propertiess:..2 facials s senda wies se deb pu als ea bead ddan 339 
1A) Dennition: 4.6.2. s read ake aks eles haved a 339 
1131.2: Dirichlet; Theorems aii:; cutis 4 cag ead fae ese Se Ses eS 340 
11.1.3 Fourier Series of Periodic Functions.................. 342 
11.1.4 Half-range Fourier Series ...............0.0.. 000000. 343 
11.1.5 Fourier Series of Nonperiodic Functions .............. 344 
11.1.6 The Rate of Convergence............... 0. e eee eee 346 
11.1.7 Fourier Series in Higher Dimensions ................. 347 

11.2 Mean Convergence of Fourier Series................0..0000. 301 
11.2.1 Mean Convergence Property ................0..0000, 351 
11.2.2 Dirichlet and Fejér Integrals .....................0.4. 353 
11.2.3 Proof of the Mean Convergence of Fourier Series ...... 395 
11:2.4 Parseval Identity s.2 00s ee haya oe REO we Se 356 
11.2.5 Riemann—Lebesgue Theorem................0..00005 307 

11.3 Uniform Convergence of Fourier series..................000. 360 
11.3.1 Criterion for Uniform and Pointwise Convergence...... 360 
ddd: 2. Rejer theoretic: hs tier ee ede Reade Red ee Rae ett 360 
11.3.3 Proof of Uniform Convergence ............... 0.0004. 361 
11.3.4 Pointwise Convergence at Discontinuous Points ....... 363 
11.3.5 Gibbs Phenomenon .............. 0.00 e eee eee 365 
11.3.6 Overshoot at a Discontinuous Point.................. 366 

11.4 Applications in Physics and Engineering.................... 371 
11.4.1 Temperature Variation of the Ground................ 371 
11.4.2 String Vibration Under Impact.................0.0.. 373 

12 Fourier Transformation ................. 0.0.0 cee eee eee 377 
12:1, Fourier Transform. 3 dos ted oa od Oe ae ta eds 377 
12.1.1 Derivation of Fourier Transform..................... 377 
12.1.2 Fourier Integral Theorem............. 0.0.0.0 ..0000. 379 
12.1.3 Proof of the Fourier Integral Theorem................ 380 
12.1.4 Inverse Relations of the Half-width .................. 381 
12.1.5 Parseval Identity for Fourier Transforms.............. 382 
12.1.6 Fourier Transforms in Higher Dimensions............. 384 

12.2 Convolution and Correlations ............. 00.0000 eee eee eee 387 
12.2.1 Convolution Theorem ............ 0.2.2. c eee eee ee eee 387 
12.2.2 Cross-Correlation Functions .................0200 eee 388 
12.2.3 Autocorrelation Functions ........... 0.0000. c eee eee 390 

12.3 Discrete Fourier Transform ............... 00.02 cece eee eee 391 
1273-1, “Dennitions «iv esr usk ade fe wae Aare eg es Le 391 
12.3.2 Inverse Transform ............. 0.0 c ce eee eee eee 392 


12.3.3 Nyquest Frequency and Aliasing..................0.. 393 


XVI 


13 


Contents 

12.3.4 Sampling Theorem ............ 00. cee eee ee ee eee eee 394 
12.3.5 Fast Fourier Transform................. 0000 ee eee eee 396 
12.3.6 Matrix Representation of FFT Algorithm............. 398 
12.3.7 Decomposition Method for FFT..................... 400 
12.4 Applications in Physics and Engineering.................... 401 
12.4.1 Fraunhofer Diffraction ].................... 002.000 401 
12.4.2 Fraunhofer Diffraction IT .......................004. 403 
12.4.3 Amplitude Modulation Technique ................... 404 
Laplace Transformation ................ 00.00. e eee eee 407 
13.1; Basic: Operations) e202 0.¢ 23 oe b.3 Re nde pied tinehd ce badd gloss 407 
13.1.1 Definitions: 2.24.05 ss aue os sees des ee ba eec es eaaea naan 407 
13.1.2 Several Remarks: ..0:.¢-.04¢.900 ¢e4 088 pagan dae deasagas 408 
13.1.3 Significance of Analytic Continuation ................ 409 
13.1.4 Convergence of Laplace Integrals .................... 410 
13.1.5 Abscissa of Absolute Convergence ...............-005 411 
13.1.6 Laplace Transforms of Elementary Functions.......... 412 
13.2 Properties of Laplace Transforms ................0.00 0020 ee 415 
13.2.1 First Shifting Theorem............... 0.0.0.0 0000 008 415 
13.2.2 Second Shifting Theorem ................. 0.0 .00000. 416 
13.2.3 Laplace Transform of Periodic Functions ............. 417 
13.2.4 Laplace Transform of Derivatives and Integrals........ 418 
13.2.5 Laplace Transforms Leading to Multivalued Functions . 420 
13.3 Convergence Theorems for Laplace Integrals ................ 422 
13.3.1 Functions of Exponential Order ..................... 422 
13.3.2 Convergence for Exponential-Order Cases ............ 424 
13.3.3 Uniform Convergence for Exponential-Order Cases .... 425 
13.3.4 Convergence for General Cases ..............0..00005 427 
13.3.5 Uniform Convergence for General Cases .............. 429 

13.3.6 Distinction Between Exponential-Order Cases 
and General Cases......... 00.0.0 cece eee eee eee eens 431 
13.3.7 Analytic Property of Laplace Transforms ............. 432 
13.4 Inverse Laplace Transform ........... 00.00. 0 cee eee eee eee 432 
13.4.1 The Two-Sided Laplace Transform .................. 432 
13.4.2 Inverse of the Two-Sided Laplace Transform .......... 434 
13.4.3 Inverse of the One-Sided Laplace Transform .......... 436 
13.4.4 Useful Formula for Inverse Laplace Transformation .... 436 
13.4.5 Evaluating Inverse Transformations.................. 439 
13.4.6 Inverse Transform of Multivalued Functions........... 4A1 
13.5 Applications in Physics and Engineering.................... 445 
13:5.1 Electric Circuits. [ieee ee eee bea ead ane 445 


13:):2%Hleetric: Cirenits TT ites ieee ats eh Aad alot eaalaon 4A7 


Contents XVII 


14 Wavelet Transformation ................ 0.0.0 cee eee eee 449 
14.1 Continuous Wavelet Analyses ............0. 000: ee eee eee eee 449 
14.1.1 Definition of Wavelet .......... 0.0... c eee ee 449 
14.1.2 The Wavelet Transform ............. 00.00.0000 eee eee 451 
14.1.3 Correlation Between Wavelet and Signal.............. 452 
14.1.4 Actual Application of the Wavelet Transform ......... 455 
14.1.5 Inverse Wavelet Transform ................0.0 200 eee 456 
14.1.6 Noise Reduction Technique ..................0 2.000% 457 

14.2 Discrete Wavelet Analysis ......... 0... cee cece cee ee eee 460 
14.2.1 Discrete Wavelet Transforms....................005- 460 
14.2.2 Complete Orthonormal Wavelets ..................-. 462 
14.2.3 Multiresolution Analysis ......... 0.00000 cece eee eee 463 
14.2.4 Orthogonal Decomposition ....................0000, 464 
14.2.5 Constructing an Orthonormal Basis.................. 466 
14.2.6 Two-Scale Relations ........... 00.0.0 cece eee eee 467 
14.2.7 Constructing the Mother Wavelet ................... 469 
14.2.8 Multiresolution Representation..................004- A71 

14.3 Fast Wavelet Transformation.............. 00.0000 eee eee eee 476 
14.3.1 Generalized Two-Scale Relations .................... 476 
14.3.2 Decomposition Algorithm ..................00 00 eee A78 
14.3.3 Reconstruction Algorithm ........... 00.0... c eee eee 479 


Part V Differential Equations 


15 Ordinary Differential Equations.........................0.. 483 
15.1 Concepts of Solutions ........ 0.0... eee 483 
15.1.1 Definition of Ordinary Differential Equations.......... 483 
15.1.2 Explicit Solution ........0 00.0.0. cc cee eee 484 
15.1.3: Implicit’ Solution: ssc .eccn2 aak Rie eden enter adasn 485 
15.1.4 General and Particular Solutions .................... 486 
15.4.5 Singular Solutions sj: sett odie bik hole week etek ala 488 
15.1.6 Integral Curve and Direction Field .................. 489 

15.2 Existence Theorem for the First-Order ODE ................ 491 
15 21 Ricard Method '.c.dsecsed linen gO Oe Da ents eet heen tdelea 491 
15.2.2 Properties of Successive Approximations ............. 493 
15.2.3 Existence Theorem and Lipschitz Condition .......... 495 
15.2.4 Uniqueness Theorem............... 20 e ee eee eee eee 497 
15.2.5 Remarks on the Two Theorems ..................-4- 498 

15.3 Sturm—Liouville Problems ............... 0.0.0 e cee eee eee ee 500 
15.3.1 Sturm—Liouville Equation .................. 202.000 500 
15.3.2 Conversion into a Sturm—Liouville Equation .......... 501 
15.3.3 Self-adjoint Operators. ...... 0.0.0... eee eee eee 502 
15.3.4 Required Boundary Condition ...................... 503 


15.3.5 Reality of Higenvalues............. 0.0... ee eee eee eee 504 


XVIII Contents 


16 System of Ordinary Differential Equations ................. 509 
16.1 Systems’ of ODES 3.2.5 (ied wees ale Sa ed ead wd Ge ace seated 509 
16.1.1 Systems of the First-Order ODEs.................... 509 
16.1.2 Column-Vector Notation ................ 00020 eee eee 510 
16.1.3 Reducing the Order of ODEs ..................000.. 510 
16.1.4 Lipschitz Condition in Vector Spaces................. 512 

16.2 Linear System of ODEs ......... 0... cece cee eee eee eee 513 
16.2.1 Basic Terminology .......... 0... ccc eee ee eee eee 513 
16.2.2 Vector Space of Solutions...............0 000 ee eee eee 514 
16.2.3 Fundamental Systems of Solutions ................... 516 
16.2.4 Wronskian for a System of ODEs.................... 517 
16.2.5 Liouville Formula for a Wronskian................... 518 
16.2.6 Wronskian for an nth-Order Linear ODE............. 519 
16.2.7 Particular Solution of an Inhomogeneous System ...... 522 

16.3 Autonomous Systems of ODEs ........ 0.0.0.0 eee eee eee 525 
16.3.1 Autonomous System ...... 0... . cee eee eee 525 
16.3:2" [ra eCtory..: i.e ae ti er aed Mata er ae 526 
16°3:3>, Critical: Point..034 se a Aa a ah 527 
16.3.4 Stability of a Critical Point .................. 000000. 527 
16.3.5 Linear Autonomous System......... 00.00. e eee eee 528 

16.4 Classification of Critical Points................. 0000.02 eee 530 
16.4.1 Improper Node........... 00.0. . eee eee 530 
16:4:2" Saddle: Point. nes sunk i ae Bias a ieee dew Ae es 531 
16:43: Proper’ Nodes. tens eet he. AE anne heed Hale oe 532 
16:4-4-Spital Points. wns gee paweie teed tenes es bebe abe oa 533 
16:4:57 Center 2 .%203-uediadaai seeded ads age eladead 533 
16:4:6.Dimit: Cycles 202 tune eres Cie Ree ee EN 534 

16.5 Applications in Physics and Engineering.................... 536 
16.5.1 Van der Pol Generator ......... 0.00.00. e eee eee eee 536 

17 Partial Differential Equations ..........................20-. 539 
71 -Basic:Propertiesitit.c. cea ie ees See ee ee eee ee 539 
EAD. Dennitions. .0 Seale ees kaa bg see each ode eae 539 
17.1.2 Subsidiary Conditions.......... 0.0... 0.0.00. eee eee 540 
17.1.3 Linear and Homogeneous PDEs ...................4. 540 
17.1.4 Characteristic Equation ............... 0000 eee eee ee 541 
17.1.5 Second-Order PDEs ............. 00.0000 eee eee ee eee 543 
17.1.6 Classification of Second-Order PDEs................. 544 

17.2 The Laplacian Operator ............ 0.00. c cece eee ee 546 
17.2.1 Maximum and Minimum Theorem................... 546 
17.2.2 Uniqueness Theorem........... 00.0002 eee eee ee eee 548 
17.2.3 Symmetric Properties of the Laplacian ............... 548 

17.3 The Diffusion Operator ........ 00.0... 550 
17.3.1 The Diffusion Equations in Bounded Domains ........ 550 
17.3.2 Maximum and Minimum Theorem................... 551 


17.3.3 Uniqueness Theorem..............0 002 e eee eee eee 551 


Contents XIX 


174) The: Wave: Operatorie. 2 iced ie Ue ee PE ee 552 
17.4.1 The Cauchy Problem ........... 0... cee eee eee ene 552 
17.4.2 Homogeneous Wave Equations ..................004. 593 
17.4.3 Inhomogeneous Wave Equations..................... 595 
17.4.4 Wave Equations in Finite Domains .................. 556 

17.5 Applications in Physics and Engineering.................... 559 
17.5.1 Wave Equations for Vibrating Strings ................ 559 
17.5.2 Diffusion Equations for Heat Conduction ............. 561 


Part VI Tensor Analyses 


18 Cartesian Tensors .......0...000 006 enna 565 
18.1 Rotation of Coordinate Axes ...... 0.00.00 ccc nee 565 
18.1.1 Tensors and Coordinate Transformations ............. 565 
18.1.2 Summation Convention ............0 0000 cece eee 566 
18.1.3 Cartesian Coordinate System ..................0000, 567 
18.1.4 Rotation of Coordinate Axes......... 0.0000 cece eee 568 
18.1.5 Orthogonal Relations ............. 00.0.0... 0 eee eee 569 
18.1.6 Matrix Representations ......... 0.0.0.0... eee ee eee 570 
18.1.7 Determinant of a Matrix .........0.0.0.00 0.00 c eee eee 571 
18.2 Cartesian Tensors .........6 0c cc ee ne eee nas 576 
18.2.1 Cartesian Vectors ...... 0.0... een 576 
18.2.2 A Vector and a Geometric Arrow..............000005 577 
18.2.3 Cartesian Tensors ........0. 0.00 cee 578 
1S22:Ar SCAlarS eit ddd acnerclaeee ddd talera dleos Gd dys Wesecal a eed datadle 4 579 
18.3 -Psevidotensors > 1.4 i: 32.0066 A eee a yeh gn Be ake wehbe de Baa bas 580 
18.3.1 Improper Rotations........ 0.0.00... eee eee eee 580 
18:3:2' PseuidovectOrgs ysis wie nunc iis ee ey ae 582 
18:33: -PSCUCOtENSOLS: 4.2.65 fea ends kn le Xen Mie ne Rats Hai ee as 584 
18.3.4 Levi-Civita Symbols ........... 0.0.0.0 c eee ee eee 584 
18.4: Tersor Algebras. sci ot Aue Sate os ee a eh eh 586 
18.4.1 Addition and Subtraction ........... 00.000 c eee eee 586 
1:84. 2: COntrachiOn nai .ok-ei5oh-ong at eae Gaede Ma ed oe OS 587 
18.4.3 Outer and Inner Products ..........0.0 0.0000. e cece 587 
18.4.4 Symmetric and Antisymmetric Tensors............... 589 
18.4.5 Equivalence of an Antisymmetric Second-Order 
Tensor to a Pseudovector... 0.0.0.0... 000 cece eee 590 
18.4.6 Quotient Theorem............. 0... c eee eee eee eee 592 
18.4.7 Quotient Theorem for Two-Subscripted Quantities..... 593 
18.5 Applications in Physics and Engineering.................... 596 
18.5.4 Inertia: Vensorii ces ct eee eed ai Gee ee 596 
18.5.2 Tensors in Electromagnetism in Solids ............... 598 
18.5.3 Electromagnetic Field Tensor ..................5004, 598 


18.5.4 Elastic Tensor ........ 0... ccc cece ee eee ees 600 


XX Contents 


19 Non-Cartesian Tensors ........... 0.00. cece eens 601 
19.1 Curvilinear Coordinate Systems ........... 0.000000. e ee eee 601 
19:1.1° Local Basis: Vectors 0.0. 6.460604 dace ccs tgadeauaes 601 
19.1.2 Reciprocity Relations .............. 0.0.0.0... ee eee 603 
19.1.3 Transformation Law of Covariant Basis Vectors ....... 604 
19.1.4 Transformation Law of Contravariant Basis Vectors... . 606 
19.1.5 Components of a Vector. ....... 0.0... eee eee eee 606 
19.1.6 Components of a Tensor............ 0.000 c eee ee eee 608 
19.1.7 Mixed Components of a Tensor ...............-.004- 609 
19.1.8 Kronecker Delta... ... 0.0.0... ccc cece eens 610 

19:2) "Metric: Rens Ob 4. 3itk-desst G4 3 ad 2 Be ed ae ee 611 
1 OPQ Den nitiON, dbo advete donc ede eNO tia SR Od bide eg age 611 
19.2.2 Geometric Role of Metric Tensors .............2.0005 612 
19.2.3 Riemann Space and Metric Tensor................... 613 
19.2.4 Elements of Arc, Area, and Volume.................. 614 
T9215 Cale: HACCOPS 2s. x0.58 «ec aautie weds & eee dd AS Sse nah S 616 
19.2.6 Representation of Basis Vectors in Derivatives ........ 617 
19.2.7 Index Lowering and Raising .....................04, 617 

19.3 Christoffel Symbols ......... 0... 0 cee eee eee eee 621 
19.3.1 Derivatives of Basis Vectors.......... 000 c eee 621 
19.3.2 Nontensor Character........... 00. eee eee eee eee 622 
19.3.3 Properties of Christoffel Symbols .................... 623 
19.3.4 Alternative Expression ..............002 02 eee eee eee 623 

19.4 Covariant Derivatives ..... 0.0... cece nee nee 627 
19.4.1 Covariant Derivatives of Vectors.......... 000.00 e eee 627 
19.4.2 Remarks on Covariant Derivatives ................00. 628 
19.4.3 Covariant Derivatives of Tensors .............0.00005 629 
19.4.4 Vector Operators in Tensor Form.................... 630 

19.5 Applications in Physics and Engineering.................... 634 
19.5.1 General Relativity Theory .....................0000. 634 
19.5.2 Riemann Tensor.......... 2... e teens 635 
19.5.3 Energy-Momentum Tensor ................. 000.000. 636 
19.5.4 Einstein Field Equation ......... 0.0.00... cee eee eee 637 

20 Tensor as Mapping ........... 0.0... cece es 639 
20.1 Vector as a Linear Function........... 0.0.0... 639 
20 Mel (OVErvlW sce Set cen cal he Saw ete Pa eco aes hat 639 
20.1.2 Vector Spaces Revisited ........... 00.00 640 
20.1.3 Vector Spaces of Linear Functions .................4. 640 

20: 0.Ac Dual Spaces: ii) el ee aeok he, Dah 641 
20.1.5 Equivalence Between Vectors and Linear Functions ....642 

20.2 Tensor as Multilinear Function ............0.. 0000 cece ee eee 643 
20.2.1 Direct Product of Vector Spaces...............0.0005 643 
20.2.2 Multilinear Functions ........ 0.0... 0c eee 644 


20.223)" Tensor: Producti a senha a bh oe A Pome Sam es PSS 644 


Contents XXI 


20.2.4 General Definition of Tensors ....................005 645 
20.3 Components of Tensors.......... 0.0.0: c eee eee eee eee ee 646 
20.3.1 Basis of a Tensor Space ......... 00... c eee eee eee 646 
20.3.2 Transformation Laws of Tensors...............0.-005 648 
20.3.3 Natural Isomorphism .............. 0.0000 e eee eee ee 648 
20.3.4 Inner Product in Tensor Language................... 651 
20.3.5 Index Lowering and Raising in Tensor Language....... 652 


Part VII Appendixes 


A Proof of the Bolzano—Weierstrass Theorem ................ 657 
Acs Wim it2POints: Gets Sawada See PAM oh DPS als eels de he we ad 657 
As2s Cantor PheOrem: 4 sx.34002 ota da els adeudea eA 44 a ean eae e hn 658 
A.3 Bolzano—Weierstrass Theorem........... 0.0000 c ce eee eee 659 

B’ -Dirae-0) Bunction xs.) .00360 oes oer selon bao Peed ee we aed 661 
Bil. Basic: Properties wins. nacido ae ee a 661 
B.2 Representation as a Limit of Function...................... 662 
B.3 Remarks on Representation 4 ............. 000. eee eee eee 663 

C Proof of Weierstrass Approximation Theorem ............. 667 

D_ Tabulated List of Orthonormal Polynomial Functions ...... 671 


1 


Preliminaries 


This chapter provides the basic notation, terminology, and abbreviations that 
we will be using, particularly in real analysis, and is designed to serve a 
reference rather than as a systematic exposition. 


1.1 Basic Notions of a Set 


1.1.1 Set and Element 


A set is a collection of elements (or points) that are definite and separate 
objects. If a is an element of a set S, we write 


aes. 


Otherwise, we write 


a€s 


to indicate that a does not belong to S. If a set contains no elements, it is 
called an empty set and is designated by @. 

A set may be defined by listing its elements or by providing a rule that 
determines which elements belong to it. For example, we write 


xX = {@1,%2,%3,+°° ,In} 


to indicate that X is a set with n elements: 71, %2,---2,. When a set con- 
tains a finite (infinite) number of elements, it is called a finite (infinite 
set). 

A set X is said to be a subset of Y if every element in X is also an element 
in Y. This relationship is expressed as 


X CY. 


2 1 Preliminaries 


When X CY and Y C X, the two sets have the same elements and are said 
to be equal, which is expressed by 


be 
lI 
x 


But when X C Y and X # Y, then X is called a proper subset of Y, and 
we use the more specific expression 


XxcCY. 
The intersection of two sets X and Y, denoted by 
XY, 
consists of elements that are contained in both X and Y. The union 
XUY 


consists of all the elements contained in either X or Y, including those con- 
tained in both X and Y. When the two sets X and Y have no element in 
common (i.e., when X 1 Y = 0), X and Y are said to be disjoint. 

For two sets A and B, we define their difference by the set 


{c: ce A, x¢ B} 


and denote it by A\B (see Fig. 1.1). In particular, if A contains all the sets 
under discussion, we say that A is the universal set and A\B is called the 
complementary set or complement of A. 


A\B 
A\B 


Fig. 1.1. Left: The difference of two sets A and B. Right: The complementary set 
or complement of B in A 


1.1 Basic Notions of a Set 3 
1.1.2 Number Sets 
Our abbreviations for fundamental number systems are given by 


N : The set of all positive integers not including zero. 
Z: The set of all integers. 

Q : The set of all rational numbers. 

R: The set of all real numbers. 


C: The set of all complex numbers. 


The symbol R” denotes an n-dimensional Euclidean space (see Sects. 
4.1.3 and 19.2.3). Points in R” are denoted by bold face, say, 2; the coordi- 
nates of x are denoted by the ordered n-tuple (a1, 22,-+-: ,U), where 7; € R. 
We also use the extended real number defined by 


R= RU {-0o, co}. 


1.1.3 Bounds 


The precise terminology for bounds of real number sets follow. Meanwhile we 
assume S to be a set of real numbers. 


@ Bounds of a set: 
1. A real number b such that x < 0 forall a € S is called an upper 
bound of S. 


2. A real number a such that x >a forall x € S is called a lower 
bound of S. 


Figure 1.2 illustrates the point. We say that a set S is bounded above or 
bounded below if it has an upper bound or a lower bound, respectively. 
In particular, when a set S is bounded above and below simultaneously, it 
is a bounded set. If a set S is not bounded, then it is said to be an un- 
bounded set. 

It follows from these definitions that if 6 is an upper bound of S, any 
number greater than b will also be an upper bound of S$. Thus it makes sense 
to seek the smallest among such upper bounds. This is also the case for a 
lower bound of S' if it is bounded below. In fact, the two extrema bounds, the 
smallest and the largest, are referred to by specific names as follows: 


@ Least upper bound: 
An element b € R is called the least upper bound (abbreviated by 
l.u.b.) or supremum, of S if 


4 1 Preliminaries 


(i) bis an upper bound of S, and 
(ii) there is no upper bound of S$ that is smaller than b. 


@ Greatest lower bound: 
An element a € R is called the greatest lower bound (abbreviated 
by g.l.b.) or infimum, if 
(i) a is a lower bound of S, and 
(ii) there is no lower bound of S that is greater than a. 


Fig. 1.2. In all the figures, the points a and 6 are lower and upper bounds of S, 
respectively. In particular, the point ag is the greatest lower bound, and the br is 
the least upper bound 

In symbols, the supremum and infimum of S are denote, respectively, by 


sup S$ and inf S. 


We must emphasize the fact that the supremum and infimum of the set S$ 
may or may not belong to S. For instance, the set S = {~: x < 1} has the 
supremum 1, which it does not belong to S. Nevertheless, particularly when 
S is finite, we have 


sup S = maxS and inf S=minS, 


where max S and min S denote the maximum and minimum of S, respec- 
tively, both of which belong to S. 


1.1.4 Interval 


When a set of real numbers is bounded above or below (or both), it is referred 
to as an interval; there are several classes of intervals as listed below. 


1.1 Basic Notions of a Set 5 


@ Intervals: Given a real variable x, the set of all values of x such that 
1. a< «x < bisa closed interval, denoted by [a, }]. 


2. a< «x < bisa bounded open interval, denoted by (a, b). 


3. a <x and x < b are unbounded open intervals, denoted by (a, co) 
and (—oo, b), respectively. 


Sets of points {x} such that 
a<x<a<ba<a<ba<a,x4r<b 


may be referred to as semiclosed intervals; see Sect. 1.1.5 for more rigorous 
definitions. Every interval J; contained in another interval J, is a subinterval 
of Ip. 


1.1.5 Neighborhood and Contact Point 


The following is a preliminary definition that will be significant in the discus- 
sions on continuity and convergence properties of sets and functions. 


@ Neighborhoods: 
Let « € R. A set V C R is called a neighborhood of z if there is a 
number é€ > 0 such that 


(a—e,u+e) CV. 


In line with the idea of neighborhoods, we introduce the following important 
concept (see Fig. 1.3): 


@ Contact points: 

Assume a point x € Randa set S C R. Then z is called a contact 
point of S if and only if every neighborhood of x contains at least one 
point of S. 


Remark. A contact point of S may or may not belong to S. In contrast, a 
point x € S is necessarily a contact point of S. 


Obviously, every point of S' is a contact point of S$. In particular, when S' is a 
single-element set given by S = {x9} with 2p € R, then zp is a contact point 
of S since every neighborhood of x9 contains x9 itself. The collection of all 
contact points of a set S' is called the closure of S and is denoted by [S]. 


6 1 Preliminaries 


00 Case B 


> $$ __$§_0=ng—> Case C 


Fig. 1.3. Case A: x is a limit point (and thus a contact point) of S. Case B: x is 
not a contact point of S. Case C: x is an isolated point (and thus a contact point) 
of S 


Contact points can be classified as follows (see again Fig. 1.3): 


@ Limit points: 
A contact point x € R is called a limit point of the set S C R if and 
only if every neighborhood V of x contains a point of S different from «x. 


@ Isolated points: 
A contact point z is called an isolated point of S if and only if x has 
a neighborhood V in which x is the only point belonging to S. 


In plain words, a limit point x is a point such that every interval (a—¢, «+¢) 
contains an infinite number of points, regardless of the smallness of ¢. A 
limit point may be referred to as a cluster point or accumulation point, 
depending on the context. The symbol S is commonly used to denote the set 
of limit points of S. 


Examples 1. If S is the set of rational numbers in the interval [0,1], then 
every point of [0,1], rational or not, is a limit point of S. 

2. The integer set Z has no limit point; it has an infinite number of isolated 
points. 

3. The origin is the limit point of the set {1/m: me N}. 


1.1 Basic Notions of a Set 7 


Remark. From the definition, a limit point of a set need not belong to the set. 
For instance, x = 1 is the limit point of the set S={x#: x€ R, x > 1}, but 
it does not belong to S. In contrast, an isolated point of S must lie in S. 


Limit points are further divided into two classes. A limit point x of a set S$ 
is called an interior point of S if and only if x has a neighborhood V C S. 
Otherwise, it is called a boundary point of S. Figure 1.4 is a schematic 
illustration of the difference between interior and boundary points. 


—— ee — C=*tK 
a b c d 


Fig. 1.4. All four points are limit points of S. Among them, b and c are interior 
points, whereas a and d are boundary points 


1.1.6 Closed and Open Sets 


Closed and open sets are defined in terms of the concepts of contact points 
and closure. Recall that a closure of S, denoted by [5], is a set of all contact 
points of S, which is a union of the two sets: all limit and all isolated points 
of S. 


@ Closed sets: 
A set S C Ris closed if [S] = S, i.e., if S coincides with its own closure. 


@ Open sets: 
A set S C R is open if S consists entirely of its interior points and has 
no boundary points. 


It follows intuitively that a set S C R is open if and only if its complemen- 
tary set is closed. The proof is given in Exercise 4 in this chapter. Note that 
the condition [S] # S is inconclusive as to whether S is open or not. 


Examples 1. Every single-element set S = {xo} with x € R is closed since 
[S] = 8. 

2. Every set consisting of a finite number of points is closed. 

3. For any real number x, the set R\{x} is open since {x} is closed. 

4. The intervals [a, b], [a, 00), and (—oo, }] are all closed, which is proven by 
considering their closures. 

5. The interval [a, b) is neither closed nor open. In fact, it is not closed since 
it excludes its boundary point 6 and it is not open since it contains its 
boundary point a. 


8 1 Preliminaries 
Exercises 


1. Give the supremum and infimum of each of the following sets: 
(1) S={a: 0<a<5}. 
(2): S={orsce QC and x? 2h. 
(3) S={x: c=3+4, nEN}. 


Solution: (1) supS = 5, inf S = 0. (2) sup S = v2, inf S = 0. 
(3) supS =4, infS=3. & 


2. Suppose S to be any of the intervals: (a, b), [a, b), (a, 6], or [a, b]. Show that 


supS=b, infS=a. 


Solution: Take S = (a,b). Since x < b for all x € S, b serves 
as one of upper bounds of $. We show that b is surely the least 
upper bound. To see this, we first assume that u is another upper 
bound of S' such that u < 6; then a < u < (u+6)/2 < b. This 
implies that 


utes and wc HE? 

which contradicts the assumption that wu is an upper bound of S. 
Hence, u > b; i.e., any upper bound other than b must be larger 
than b. We thus conclude that 6 = sup S. The proof is similar for 
the other three cases. & 


3. Show that the set of integers has no limit point, i.e., Z=0. 


Solution: Take any x € Z, and let ¢ = min{|n—2|: ne€ Z}. 
The interval (x—e, «+e) contains no integers other than 2; hence, 
ce€ Z. Since this is the case for any x € Z, we conclude that Z 
is totally composed of isolated points. d& 


4. Show that a set S C R is open if and only if its complementary set R\S 
is closed. 


Solution: If S is open, then every point « € S has a neighborhood 
contained in S. Therefore no point x € S can be a contact point of 
R\S. In other words, if x is a contact point of R\S, then « € R\S, 
i.e., R\S is closed. 

Conversely, if R\S' is closed, then any point x € S must have a 
neighborhood contained in S, since otherwise every neighborhood 


1.2 Conditional Statements 9 


of « would contain points of R\S, i.e., 2 would be a contact point 
of R\S not in R\S. Therefore S is open. & 


1.2 Conditional Statements 


Phrases such as if... then..., and ... if and only if ... are frequently used to 
connect simple statements that can be described as either true or false. For 
the sake of typographical convenience, there are conventional logical symbols 
for representing such phrases. 

Suppose P and Q are two different statements. The compound statements 


if P then Q 


and 
P implies Q 


mean that if P is true then Q is true. This is written symbolically as 
P=Q. (1.1) 
We say that 


P is a sufficient condition for Q 


or 


Q@ is a necessary condition for P. 


In the above context, P stands for the hypothesis or assumption, and Q is the 
conclusion. 


Remark. To prove the implication (1.1) in actual problems, it suffices to ex- 
clude the possibility that P is true and Q is false. This may be done in one 
of three ways. 


1. Assume that P is true and prove that Q is true (direct proof). 

2. Assume that Q is false and prove that P is false (contrapositive proof). 

3. Assume that P is true and Q is false, and then prove that this leads to a 
contradiction (proof by contradiction). 


When P implies Q and Q implies P, we abbreviate this to 


P<} Q, 


10 1 Preliminaries 


and we say that 
P is equivalent to Q 


or, more commonly, 


P if and only if Q. 
This also means that P is a necessary and sufficient condition for Q. 
Examples Observe that 
g=1 > ¢7?=1 and c=-1 = 2’? =1. 
Conversely, we see that 
e=1 > 2=-lorl. 
Therefore, we conclude that 


g=1 <=> xe {-1,1}. 


1.3 Order of Magnitude 


1.3.1 Symbols O, 0, and ~ 


We use the notations O, 0, and ~ to express orders of magnitude. To explain 
their use, we consider the behavior of functions f(a) and g(x) in a neighbor- 
hood of a point Zo. 


1. We write 
f(z) = O(g(z)), x xo 


if there exists a positive constant A such that 


If(z)| < Alg()| 


for all values of x in some neighborhood of xo. 


2. We write 
f(x) = o(g(z)), x xo 
. f(a) 
ete | g(z) | — 
3. We write 
f(x) ~ g(x), x xo 
if 


f(x) 


xr g(a) 


1.3 Order of Magnitude 11 


In addition to the formal definitions above, we summarize the actual mean- 
ing of these symbols: 


1. f(x) = O(g(x)) means that f(a) does not grow faster than g(x) as x — Zo. 
2. f(x) = o(g(x)) means that f(x) grows more slowly than g(x) as x > 2. 
3. f(x) ~ g(z) means that f(x) and g(x) grow at the same rate as x — Xo. 


We occasionally employ the symbols 
f(a) =O(1) as x > 20. 

This simply means that f(a) is bounded on the order of 1. The symbol 
f(a) = o(1) as x 29 

means that f(a) approaches zero as x —> 20. 


Examples The relations 1-3 below hold for 7 — oo. 


1 1 O 1 1 1 1 1 
. = =O ~N * 
14+ 2? v2)? 14+ x2 a)? L+a2 a? 


9 1 1 O 1 1 1 1 
, = = + O ‘ 
1+ x2 2 gt J? 1+ q2 2 2 


The following hold for 7 — 0: 
4, sinz=O(1), sinte~x, cosr=1+O(a’). 


1.3.2 Asymptotic Behavior 


Asymptotic behavior of f(x) as « — a can be quantified by using the powers 
of (x — a) as comparison functions. As an example, suppose that a function 
f(a) satisfies the relation 


f(x) =O((a@—a)’) for «>a (1.2) 


for some real number p = po. Then, the relation (1.2) clearly holds for all p 
for p < po, and it may or may not hold for some p if p > po. Thus we can 
define the supremum of such p’s that satisfy (1.2), and denote it by q, ie., 


q =sup{p | f(x) = O((# — a)?)}. (1.3) 


In this case, we say that f vanishes at x = a to order gq. The quantity q 
defined by (1.3) is useful for describing the asymptotic behavior of f(x) in the 
vicinity of x =a. 


12 1 Preliminaries 


Remark. Note that (1.3) itself does not imply that 
f(@) =O((@—-a)), roa. 


For instance, the function f(x) = log x defined within the interval (0,1) yields 
q = 0, since for x > 0, 


=O(z?) p<9d, 
loge {3 (x?) p>O. 


But it is obvious that log x 4 O(1). 


1.4 Values of Indeterminate Forms 


1.4.1 ’?H6pital’s Rule 


A function f(a) of the form u(x)/v(x) is not defined for « = a if f(a) takes 
the form 0/0. Still, if the limit lim,_., f(x) exists, then it is often desirable 
to define f(a) = lim;4 f(x). In such a case, the value of the limit can be 
evaluated by using the following theorem: 


@ l’H6pital’s rule: 

Let u(a) = v(a) = 0. If there exists a neighborhood of z = a such that 
(i) v(x) £0 except for « = a, and 
(ii) u’(x) and v'(x) exist and do not vanish simultaneously, then, 


whenever the limit on the right exists. 


For the proof of the theorem, see Exercise 3 in Sect. 8.1. 


Remark. If u'(x)/v' (a) is itself an indeterminate form, the above method may 
be applied to u’(x)/v’(a) in turn, so that 


lim ue = lim ne) i 
aa U(x ra u(x 


a 
8 
| 
8 
Sq 
— 
8 
Sy 


If necessary, this process may be continued. 


1.4 Values of Indeterminate Forms 13 
1.4.2 Several Examples 


In the following, we show several examples of indeterminate forms other 
than the form of 0/0 previously discussed. Often functions f(x) of the forms 
u(x)v(x), fu(a)|°, and u(x) — v(x) can be reduced to the form p(x) /q(z) 
with the aid of the following relations: 


u(z) v(x) 
= ean Llate 


log u(2) = log u(a) 
1/v(2x) 1/u(a) ’ 
— eul(®) 


v(t) ule) _ log h(x), where h(a) = Du(z)* 
e 


[u(x)]" = 9, where g(x) = 


After the reduction, the  H6pital method given in Sect. 1.4.1 becomes appli- 
cable. 


Part I 


Real Analysis 


2 


Real Sequences and Series 


Abstract In this chapter, we deal with the fundamental properties of sequences and 
series of real numbers. We place particular emphasis on the concept of “convergence,” 
a thorough understanding of which is important for the study of the various branches 
of mathematical physics that we are concerned with subsequent chapters. 


2.1 Sequences of Real Numbers 


2.1.1 Convergence of a Sequence 


This section describes the fundamental definitions and ideas associated with 
sequences of real numbers (called real sequences). We must emphasize 
that the sequence 

(tn: ne N) 


is not the same as the set 
{ayn : ne N}. 


In fact, the former is the ordered list of x,, some of which may be repeated, 
whereas the latter is merely the defining range of x,,. For instance, the constant 
sequence x, = 1 is denoted by (1,1,1,---), whereas the set {1} contains only 
one element. 

We start with a precise definition of the convergence of a real sequence, 
which is an initial and crucial step for various branches of mathematics. 


@ Convergence of a real sequence: 

A real sequence (z,,) is said to be convergent if there exists a real 
number «x with the following property: For every ¢ > 0, there is an integer 
N such that 

n>N => |t,-—2| <e. (2.1) 


18 2 Real Sequences and Series 


We must emphasize that the magnitude of < is arbitrary. No matter how 
small an ¢ we choose, it must always be possible to find a number N that will 
increase as € decreases. 


Remark. In the language of neighborhoods, the above definition is stated as 
follows: The sequence (x»,) converges to x if every neighborhood of x contains 
all but a finite number of elements of the sequence. 


When (,,) is convergent, the number «x specified in this definition is called 
a limit of the sequence (#,,), and we say that x, converges to x. This is 
expressed symbolically by writing 
lim ty, = 2, 
n—-co 
or simply by 
In > @. 


If (x,) is not convergent, it is called divergent. 


Remark. The limit « may or may not belong to (,,); this situation is similar 
to the case of the limit point of a set of real numbers discussed in Sect. 1.1.5. 


An example in which x = lima, but x 4 x, for any n is given below. 


Examples Suppose that a sequence (2,,) consisting of rational numbers is de- 
fined by 

(,) = (84;-814; 8.142, Bayo), 
where x, € Q is a rational number to n decimal places close to 7. Since the 
difference |x, — 7| is less than 10~”, it is possible to find an N for any ¢ > 0 
such that 

n>N => |tp,-7|<e. 

This means that 


lim ¢, =T. 
n—Cco 


However, as the limit, 7, is an irrational number it is not in Q. 


Remark. The above example indicates that only a restricted class of convergent 
sequences has a limit in the same sequence. 


2.1.2 Bounded Sequences 


In the remainder of this section, we present several fundamental concepts 
associated with real sequences. We start with the boundedness properties of 
sequences. 


2.1 Sequences of Real Numbers 19 


@ Bounded sequences: 
A real sequence (Z,,) is said to be bounded if there is a positive number 
M such that 
ltn| <M forall ne N. 


The following is an important relation between convergence and boundedness 
of a real sequence: 


@ Theorem: 
If a sequence is convergent, then it is bounded. 


Proof Suppose that x, — «x. If we choose ¢ = 1 in (2.1), there exists an integer 
N such that 
|v, —2| <1 forall n> N. 


Since |2,| — |x| < |v, — 2], it follows that 
|en|<1+|2| for all n> N. 
Setting M = max{|x1|,|r2|,--- , |\7n—1|,1 + |2|} yields 
|an|< M forall ne N, 
which means that (,,) is bounded. & 


Remark. Observe that the converse of the theorem is false. In fact, the sequence 
( 1, =1,2,—1)* Be rer ) 


is divergent, although it is bounded. 


2.1.3 Monotonic Sequences 
Another important concept in connection with real sequences is monotonicity, 


defined as follows: 


@ Monotonic sequences: 
A sequence (2,,) is said to be 
1. increasing (or monotonically increasing) if t)41 > vp for alln € N, 


2. strictly increasing if 2,4) > 2%, for allnEe N, 


20 2 Real Sequences and Series 


3. decreasing (or monotonically decreasing) if a,41 < 2%» for alln € N, 
and 


4. strictly decreasing if 7,41 <a, forallne N. 


These four kinds of sequences are collectively known as monotonic 
sequences. Note that a sequence (x,,) is increasing if and only if (—2,) is 
decreasing. Thus, the properties of monotonic sequences can be fully investi- 
gated by restricting ourselves solely to increasing (or decreasing) sequences. 

Once a sequence assumes monotonic properties, its convergence is deter- 
mined only by its boundedness, as stated below. 


@ Theorem: 

A monotonic sequence is convergent if and only if it is bounded. More 
specifically, 
(i) If (@,) is increasing and bounded above, then its limit is given by 


ia, 72, = SHS. 
n—CcoO 


(ii) If (a,) is decreasing and bounded below, then 


lime —etnited: ae 
n—coO 


Proof If (ap) is convergent, then it must be bounded as proven earlier (see 
Sect. 2.1.2). Now we consider the converses for cases (i) and (ii). 


(i) Assume (2,,) is increasing and bounded. The set S' = {a,,} will then have 
the supremum denoted by sup S' = a. By the definition of the supremum, 
for arbitrary small ¢ > 0 there is an ay € S such that 


LN > LE. (2.2) 
Since x, is increasing, we obtain 
In > ty forall n>N. (2.3) 
Moreover, since x is the supremum of S$, we have 
x >, for all ne N. (2.4) 
From (2.2), (2.3), and (2.4), we arrive at 


\tn — 2] =U-—a, <a—ay <e forall n>N, 


2.1 Sequences of Real Numbers 21 


which gives us the desired conclusion, i.e, 


lim t, =x =supS. 


n—O0o 


(ii) If (x,,) is decreasing and bounded, then (—2,,) is increasing and bounded. 
Hence, from (i), we have 


lim (—2,,) = sup(—S). 


n—0o 


Since sup(—S) = —inf S, it follows that 
lim z, =inf S. & 


n—- co 


2.1.4 Limit Superior and Limit Inferior 


We close this section by introducing two specific limits an any bounded 
sequence. Let (x,) be a bounded sequence and define two sequences (Yn) 
and (z,) as follows: 


Ym = sup{xe: k > nj, (2.5) 
Zn = inf{r,: k > n}. 


Note that y, and z,, differ, respectively, from sup{x,,} and inf{z,,}. It follows 
from (2.5) that 


yi =sup{ty: k>1} > yo =sup{r_,: k > 2} > y3---, 
which means that the sequence (y,,) is monotonically decreasing and bounded 
below by inf ,,. Thus in view of the theorem in Sect. 2.1.3, the sequence (y,,) 
must be convergent. The limit of (y,) is called the limit superior or the 
upper limit of (x,,) and is denoted by 
limsupz, (or lim z,). 
n—- oo 

Likewise, since (z,,) is increasing and bounded above by sup, it possesses 
the limit known as the limit inferior or lower limit of x,, denoted by 


liminfz, (or lim z,,). 
n—co 


In terms of the two specific limits, we can say that a bounded sequence (2,,) 
converges if and only if 


lim z, = limsup 27, = liminf ry. 
n— oo nisx6S noo 


(A proof will be given in Exercise 4 in Sect. 2.1.4.) Note that by definition, it 
readily follows that 
limsup z, > liminf x, 


lim sup(—2p,) = — liminf ap. 
n—0o (= 08: 


22 2 Real Sequences and Series 


Examples 1. x, =(-1)" => limsupz, =1, liminfz, = —1. 
ito n—-00 

1 ’ eee 
2. tv, =(-1)"+-— => limsupz, =1, liminfaz, = —-1. 

n n—oo n—-co 

—1)" —1)” 
3. Lon = 1+ (=1) 5 Bon (=1) , = limsupz, =1, liminfz, =0. 
n n ESOS n—0o 


4, (ap) = (2,0, -2,2,0,-2,---) = limsupez, = 2, liminfx, = —2. 
n—-0o nc 
The four cases noted above are illustrated schematically in Fig. 2.1. All 
the sequences (a,) are not convergent and thus the limit limp. a@, does 
not exist. This fact clarifies the crucial difference between lim,_.., %, and 
lim sup,,_.o £n (or liminfp 6 Xn). 


Fig. 2.1. All the sequences of {xn} in the figures do not converge, but they all 
possess limsup #, = 1 and liminf x, = —1 


n—0o Tt OO 


The limit superior of x, has the following features and similar features are 
found for the limit inferior. 


2.1 Sequences of Real Numbers 23 


@ Theorem: 
1. For any small ¢ > 0, we can find an N such that 


n>N > @&y <limsupz, +e. 


n—Co 


2. For any small ¢ > 0, there are an infinite number of terms of x, such 
that 
limsup %, — € < &p- 


n— oo 


Proof 1. Recall that limsupz, = lim yn, where y, is defined in (2.5). For 


any € > 0, there is an integer N such that 


n>N => limsups, —€ < yn < limsupz, +e. 


noo noo 


Since yy, > Lp for all n, we have 


n>N => an <limsuprz,+e. & 


n—oo 


2. Suppose that there is an integer m such that 


n>m => limsupt, —€> 2p. 
n—-cCoO 


Then for all k > n > m, we have 


ry, <limsup rp — €, 


n—oCo 


which means that 


Yn <limsup 2, —¢ for alln > m. 


n—Cco 


In the limit of n — oo, we find a contradiction such that 


limsup z, < limsup xy, — €. 
n— oo n—- co 


This completes the proof. & 


Exercises 


1. Prove that if the sequence (x,,) is convergent, then its limit is unique. 


24 2 Real Sequences and Series 


Solution: Let « = lima, and y = lima, with the assumption 
x # y. Then we can find a neighborhood V; of x and a neigh- 
borhood V2 of y such that V; 9 V2 = 0. For example, take 
V, = (w«@—6€,4+¢) and Vo = (y—e¢,y +e), where ¢ = |x — y|/2. 
Since tr, — 2, all but a finite number of terms of the sequence 
lie in Vj. Similarly, since yn — y, all but a finite number of its 
terms also lie in V2. However, these results contradict the fact that 
V9 V2 = 0, which means that the limit of a sequence should be 
unique. & 
2. If x, — x £0, then there is a positive number A and an integer N such 
thatn > N => |a2,| > A. Prove it. 


Solution: Let ¢ = |2|/2, which is a positive number. Hence, there 
is an integer N such that n > N => |an—2| < € > ||rn| — |2|| <e. 
Consequently, |2| —€ < |a,| < |z| + for all n > N. From the 
left-hand inequality, we see that |a,| > |2|/2, and we can take 
M = |x|/2 to complete the proof. d& 


3. Prove that the sequence x, = [1 + (1/n)]” is convergent. 


Solution: The proof is completed by observing that the sequence 
is monotonically increasing and bounded. To see this, we use the 
binomial theorem, which gives 


Likewise we have 


1 1 1 1 2 
ati =1+1 1 1 1 vee 
ae +5 ( =i) al i) ( —)+ 
eee ea ee 
(n+ 1)! n+1 n+1 n+1 


Comparing these expressions for x, and 7,41, we see that every 
term in x, is no more than corresponding term in %,41. In ad- 
dition, £4, has an extra positive term. We thus conclude that 
Lnt+1 > Ly for all n € N, which means that the sequence (x) is 
monotonically increasing. 

We next prove boundedness. For every n € N, we have rp < 
po (1/k!). Using the inequality 2”~! < n! for n > 1 (which can 
be easily seen by taking the logarithm of both sides), we obtain 


2.2 Cauchy Criterion for Real Sequences 25 


“1 1—(1/2)” 
Thus (x,,) is bounded above by 3. Thus, view of the theorem in 
Sect. 2.1.3, the sequence is convergent. & 
4. Denote = limsupz, and x = liminfz,. Prove that a sequence (xp) 
converges to x if and only ifa#=Z%= a. 

Solution: In view of the theorem in Sect. 2.1.4, it follows that 
(—oo, +) contains all but a finite number of terms of (x,,). The 
same property applied to (—x,,) implies that (x — €,00) contains 
all but a finite number of such terms. If « = % = 2, then (x — 
€,x +e) contains all but a finite number of terms of (a,,). This is 
the assertion that x, — x. 

Now suppose that x,, — x. For any € > 0, there is an integer N 
such that n > N > ay, < @+e > Yn < xt+e, where yp, = sup{z, : 
k > n}, as was introduced in (2.5). Hence,  < x +. Since € > 0 
is arbitrary, we obtain Z < x. Working with the sequence (—2,,), 
whose limit is —x, following same procedure, we get x > x. Since 
a <Z,weconclude thatr=T=2. & 


2.2 Cauchy Criterion for Real Sequences 


2.2.1 Cauchy Sequence 


To test the convergence of a general (nonmonotonic) real sequence, we have 
thus far only the original definition given in Sect. 2.1.1 to rely on; in that 
case we must first have a candidate for the limit of the sequence in question 
before we can examine its convergence. Needless to say, it is more convenient 
if we can determine the convergence property of a sequence without having to 
guess its limit. This is achieved by applying the so-called Cauchy criterion, 
which plays a central role in developing the fundamentals of real analysis. 
To begin with, we present a preliminary notion for subsequent discussions. 


@ Cauchy sequence: 

The sequence (x,,) is called a Cauchy sequence (or fundamental 
sequence) if for every positive number ¢, there is a positive integer N 
such that 

iis > IN => |r =e. << & (2.6) 


This means that in every Cauchy sequence, the terms can be as close to one 
another as we like. This feature of Cauchy sequences is expected to hold for 
any convergent sequence, since the terms of a convergent sequence have to 
approach each other as they approach a common limit. This conjecture is 
ensured in part by the following theorem. 


26 2 Real Sequences and Series 


@ Theorem: 
If a sequence (#,,) is convergent, then it is a Cauchy sequence. 


Proof Suppose limz,, = x and € is any positive number. From hypothesis, 
there exists a positive integer N such that 

€ 

n>N => |a,—-2|< rt 


Now if we take m,n > N, then 
€ € 
lan —2| < 5 and |t%»,—2| < 5 
It thus follows that 
|Un —Lm| < |@m — 2] + lan —2| <e, 


which means that (#,) is a Cauchy sequence. & 


This theorem naturally gives rise to a question as to whether converse 
true. In other words, we would like to know whether all Cauchy sequences are 
convergent or not. The answer is exactly what the Cauchy criterion states, as 
we prove in the next subsection. 


2.2.2 Cauchy Criterion 


The following is one of the fundamental theorems of real sequences. 


@ Cauchy criterion: 
A sequence of real numbers is convergent if and only if it is a Cauchy 
sequence. 


Bear in mind that the validity of this criterion was partly proven by demon- 
strating the previous theorem (see Sect. 2.2.1). Hence, in order to complete 
the proof of the criterion, we need only prove that every Cauchy sequence is 
convergent. The following serves as a lemma for developing the proof. 


@ Bolzano — Weierstrass theorem: 
Every infinite and bounded sequence of real numbers has at least one 
limit point in R. (The proof is given in Appendix A.) 


We are now ready to prove that every Cauchy sequence is convergent. 


2.2 Cauchy Criterion for Real Sequences 27 


Proof (of the Cauchy criterion): Let (x,) be a Cauchy sequence 
and S = {x, : n € N}. We consider two cases in turn: (i) the set S 
is finite, and (ii) S is infinite. 


(i) It follows from the hypothesis that given ¢ > 0, there is an integer 
N such that 
mnr>N => |t,—Lm| <e. (2.7) 


Since S is finite, one of the terms of the sequence (x), say 2, 
should be repeated infinitely often in order to satisfy (2.7). This 
implies the existence of an m > N such that x, = x. Hence, we 
have 
n>N => |t,-2| <e, 

which means that x, — 2. 

(ii) Next we consider the case that S is infinite. It can be shown that 
every Cauchy sequence is bounded (see Exercise 1). Hence, in 
view of the Bolzano — Weierstrass theorem, the sequence (2,) 


necessarily has a limit point x. We shall prove that 7, — x. Given 
€ > 0, there is an integer N such that 


mn>N => |a,—Um| <e. 


From the definition of a limit point, we see that the interval (a — 
€,x+€) contains an infinite number of terms of the sequence (Z,). 
Hence, there is an m > N such that x, € (a—¢,x +), ie., such 
that |v, — @%m| < ¢. Now, ifn > N, then 


|tn — 2| < [an —2m| + [2m — 2] <ete=2e, 
which proves %p), — 2. 


The results for (i) and (ii) shown above indicate that every Cauchy 
sequence (finite and infinite) is convergent. Recall again that its con- 
verse, every convergent sequence is a Cauchy sequence, was proven ear- 
lier in Sect. 2.2.1. This completes the proof of the Cauchy 
criterion. & 


Exercises 


1. Show that every Cauchy sequence is bounded. 


Solution: Let (#,) be a Cauchy sequence. Taking ¢ = 1, there is 
an integer N such that 


n>N => |t,-an| <1. 


28 2 Real Sequences and Series 
Since |a,| — |an| < lan — xn|, we have 
n>N => |an| < len] +1. 


Thus |,,| is bounded by max{|x,|, |zo|,--- ,|e7n—i|,|en|+1}. & 


2. Let a, = 1, vg = 2, and a, = (@n-1 + Un—2)/2 for all n > 3. Show that 
(ap) is a Cauchy sequence. 


Solution: Since for n > 3, @y —&n—1 = —(Xn—1 — Ln—2)/2, we use 
the induction on n to obtain rp, —%pn41 = (—1)"/2""! for all née 
N. Hence, if m > n, then 


|Un — Lm| < |vn Ln+1\ t |Un44 Ln+2\ isa |Sm—1 Li 
m1 m—n—-1 
1 1 1 
=e oe ee 
k=n k=0 
1 1-(1/2)™-" 1 1 1 


grt (1/2) 2-1 Ta) ~ 2 


Since 1/2”~? decreases monotonically with n, it is possible to 
choose N for any ¢ > 0 such that (1/2%~?) < ¢. We thus conclude 


that 
1 n—-2 1 N-2 
m>n2N > lo —2nl < (5) <(5) <6, 


which means that (x) is a Cauchy sequence. & 


3. Suppose that the two sequences (x,,) and (y,,) converge to a common limit 
c and consider their shuffled sequence (z,,) defined by 


(21, 22, 23, 24,°++) = (1, Y1, £2, y2,--+). 
Show that the sequence (z,,) also converges to c. 


Solution: Let ¢ be any positive number. Since x, — cand yn — ¢, 
there are two positive integers N; and N»2 such that 


n>N, => |t,-—cel<e¢ and n> No => |y, —¢| <e. 


Define N = max{Nj, No}. Since xp = Zop~-1 and yx = 20% for all 
k € N, we have 


k>N => |xp—ce| = |zo~-1—¢] < € and |yg —¢| = |22n — ec] <e. 


Hence, n > 2N—1 => |zn,—c| < €, which just means limz, =c. & 


2.3 Infinite Series of Real Numbers 29 
4. Show that lim (a"/n*) — 00, where a > 1 and k > 0. 
n—co 
Solution: We consider three cases in turn: (i) k = 1, (ii) k < 1, 
and (iii) k > 1. 
(i) Let k= 1. Then set a = 1+/h to obtain 


(n= 1) 


a" =(14+h)!"=1ltnh+— he 4... > 


which results in 


a"/n= (1+ h)"/n > (n—1)h"/2 > c©. (n > ov). 


(ii) The case of k < 1 is trivial since a" /n* > a" /n for any n > 1. 


(iii) If k > 1, then a!/* > 1 since a > 1. Hence, it follows from 
the result of (i) that for any M > 1, we can find an n so that 
n>M = a'/*/n> M. This means that 


nk 


nk n 


which implies that a”/n* — co. & 
5. Let x, = a"/n! with a > 0. Show that the sequence (x,,) converges to 0. 
Solution: Let k& be a positive integer such that k > 2a, and define 
c =a" /k!. Then for any a > 0 and for any n > k, we have 
a” a a a Cc en 6298 
= . et = ; 2.8 
ml “k+1 kt2 on QPF ~ On SH 2) 
Since (2.8) holds for a sufficiently large n (> k), it also holds for 
n satisfying n > 2*c/e, where ¢ is an arbitrarily small number. In 
the latter case, we have 


a” re 
i ee 
n! n 
which means that 
qa” 
lm z, = lim —=0. & 


2.3 Infinite Series of Real Numbers 


2.3.1 Limits of Infinite Series 


This section focuses on convergence properties of infinite series. The im- 
portance of this issue will become apparent, particularly in connection with 


30 2 Real Sequences and Series 


certain branches of functional analysis such as Hilbert space theory and or- 
thogonal polynomial expansions, where infinite series of numbers (or of func- 
tions) enter quite often (see Chaps. 4 and 5). 

To begin with, we briefly review the basic properties of infinite series of real 
numbers. Assume an infinite sequence (a1, d2,-+- ,@n,-+:) of real numbers. 
We can then form another infinite sequence (Aj, A2,--: ,An,:--) with the 


definition m 
An => Se Qk. 
k=1 


Here, A, is called the nth partial sum of the sequence (a,), and the 
corresponding infinite sequence (A,,) is called the sequence of partial sums 
of (a,,). The infinite sequence (A,,) may or may not be convergent, which de- 
pends on the features of (a,). 

Let us introduce an infinite series defined by 


San =a tant. (2.9) 
k=1 


The infinite series (2.9) is said to converge if and only if the sequence (A,,) 
converges to the limit denoted by A. In other words, the series (2.9) converges 
if and only if the sequence of the remainder R,,41; = A— A, converges to 
zero. When (A,,) is convergent, its limit A is called the sum of the infinite 
series of (2.9), and we may write 


loc) n 

) ap = lim ) ay = lim A, = A. 
Ti 00. Tt— CO: 

k=1 k=1 


Otherwise, the series (2.9) is said to diverge. 
The limit of the sequence (A,,) is formally defined in line with Cauchy’s 
procedure as shown below. 


@ Limit of a sequence of partial sums: 
The sequence of partial sums (A,,) has a limit A if for any small ¢ > 0, 
there exists a number N such that 


n>N => |A,-Al| <e. (2.10) 
Examples 1. The infinite series ss Z _ ne converges to 1 because 
- : ek Hggel e 


os 
k= 


Pe od 1 
HG o)= ee ee 


2.3 Infinite Series of Real Numbers 31 


2. The series So(-1 diverges because the sequence 
k=1 
“ 0 n (is even) 
_ _4)\k = ’ 
An = yy, ) { ire soda) 
does not approache any limit. 


3. The series + 1=1+1+1+--- diverges since the sequence A, = 


k=1 
n 


y 1 =n increases without limit as n — oo. 
k=1 


2.3.2 Cauchy Criterion for Infinite Series 
The following is a direct application of the Cauchy criterion to the sequence 


(A,,), which consists of the partial sum A, = S77 Gx: 


@ Cauchy criterion for infinite series: 
The sequence of partial sums (A,,) converges if and only if for any small 
€ > 0 there exists a number N such that 


nm>N => |An—-—Anm| < é. (2AM) 


Similarly to the case of real sequences, the Cauchy criterion alluded to above 
provides a necessary and sufficient condition for convergence of the sequence 
(A,,). Moreover, from the definition, it also gives a necessary and sufficient 
condition for convergence of an infinite series )°7°_, a,. Below is an important 
theorem associated with the latter statement. 


@ Theorem: 
If an infinite series }>7-, a; is convergent, then 


lim a, = 0. 
Tt — 7 OO: 


Proof From hypothesis, we have 


Hence, 


32 2 Real Sequences and Series 


According to the theorem above, lima, = 0 is a necessary condition for 
the convergence of A,,. However, it is not a sufficient condition, as shown in 
the following example. 


Examples Let az, = 1/Vk. Although limp... a4, = 0, the corresponding infi- 
nite series )> a, diverges, as seen from 


3 


1 1 

5 @fp= 14 ae 

= V2 vn 
1 1 1 n 
ee —— see = — : 
ie Ga ae. 


Remark. The contraposition of the previous theorem serves as a divergent 
test of the infinite series in question; we can say that 


co 
lim a,40 > > ax is divergent. 


2.3.3 Absolute and Conditional Convergence 


Assume an infinite series 3 
s (2.12) 
k=1 


and an associated auxiliary series 


Co 


S- laxl, (2.13) 


k=1 


in the latter of which all terms are positive. If the series (2.13) converges, 
then the series (2.12) is said to converge absolutely. The necessary and 
sufficient condition for absolute convergence of (2.12) is obtained by replacing 
A, in (2.11) by 

By = |aa| + laa] +--+ lanl. 
If the series (2.13) diverges and the original series (2.12) converges, we say that 


the series (2.12) converges conditionally. These results are summarized by 
the statement below. 


2.3 Infinite Series of Real Numbers 33 


@ Absolute convergence: 
The infinite series }> az is absolutely convergent if }> |a,| is convergent. 


@ Conditional convergence: 
The infinite series }> ax is conditionally convergent if }> az, is convergent 
and > |a,| is divergent. 


Examples The infinite series 
o° k+1 
(ae ale (2.14) 
k=1 
converges conditionally, since it converges while its absolute-value series 
ore (-1)**4/k| = S372, (1/k) diverges. See Exercises 1 and 2 in this 
section. 


The following is an important theorem that we use many times in the 
remainder of this book. 


@ Theorem: 
An infinite series converges if it converges absolutely. 


Proof Suppose that the series (B,,) consisting of 


n 
Br =) la 
k=1 


converges as n — oo. This means that for any ¢ > 0 a number N exists such 
that 
nm>N => |B, -Bry|<e. (2.15) 


Assuming n > m, we rewrite the left-hand inequality in (2.15) as 
[Bn — Bm| = |@m+il + |@mtal + +++ + lan| 
2 |@m+1 ar Am+2 Spree An| 


where we used the law of inequalities for sums. Hence, it follows from (2.15) 
and (2.16) that 


n,m>N => |An—Am| <e, 
which means that the series }* a, converges. de 


34 2 Real Sequences and Series 


The converse of the above theorem is not true. Below we present a well- 
known example of a convergent series that is not absolutely convergent. 


2.3.4 Rearrangements 


Observe that the conditionally convergent series (2.14) expressed by 


Le De dd 


1 a a 
2 a 3 4 =5 ap) 
may be rearranged in a number of ways, such as 
1 1 1 1 
Tees re ti Sui 
a Se J 5 4 , oe 
os is oi 1 
1 vee 2.19 
2 8 7 var 3 ey) 


or in any other way in which the terms 1, —1/2,1/3,—1/4,--- are added ina 
certain order. Series such as (2.18) and (2.19) are called rearrangements of 
the series (2.17). 

Of importance is the fact that rearranging procedures may change the 
convergence property of a conditionally convergent series; in what way this 
happens depends on the nature of the original series, as we shall now see. 
Suppose a series }>a,, to be conditionally convergent. Then, the sum of its 
positive terms or that of its negative term goes, respectively to +oo or —o0; 
otherwise the original series would diverge or converge absolutely. Let (b,,) 
and (c,) be, respectively, the subsequences of positive and negative terms of 
(a,,). Since )*;_, bg is monotonically increasing with respect to n, there is a 
positive integer m, such that 


my 
+ by = 1- C\. 
k=1 


Here the right-hand side is positive since c; is negative. We rewrite it as 


my, 


So bp $e 1, 
k=1 


Similarly, there is an integer mz > my, such that 


m2 
> by +c > 1. 
k=1 


Continue on the same process for m3,™m4,--- ,™Mn and take the sum of each 


side to obtain ee . 
So bp + So cn > nr. (2.20) 
k=1 k=1 


2.3 Infinite Series of Real Numbers 35 


Note that the left-hand side is a partial sum of the rearrangement of the 
sequence (az) that may, for instance, take the form of 


(b1, be, ? Om 5 C1, Om, 41; Om, 42; aes Ome, C2, - -) 2 (2.21) 


Clearly, the left-hand side of (2.20) diverges as n — oo, which means that 
the rearrangement (2.21) diverges. Therefore, the conditionally convergent 
series may become divergent through the rearranging procedure. In fact, the 
discussion above serves as part of the proof of the theorem below. 


@ Riemann theorem: a 
Given any conditionally convergent series and any r € R = RMoo, there 
is a rearrangement of the series that converges to r. 


Proof The case of r = oo was proved in the previous discussion. Now let 
r € R and assume that (b,) and (c,) is the subsequence of positive and 
negative terms, respectively, in the same order in which they appear in (a,). 
It is possible to obtain the smallest sum such that 


m1 
sy = ) by 
k=1 


exceeds r. Then, add the least number of negative terms c, to obtain the 
largest sum. Such that 


my ni 


so= So be +S cx 
pet fe 


is less than r. Proceeding in this fashion, we obtain a sequence sj, $2, 83,:-- 
that converges to 1, since 


lim 6, = lim c, = 0. 
nh Co. n—-oo 


This result is the case for an arbitrary real number r. Hence, the proof is 
complete. & 


Exercises 


1. Determine the convergent property of the series 
co 
k=1 


This is known as a harmonic series. 


; (2.22) 


>) 


36 


2. 


2 Real Sequences and Series 


Solution: Let A, = >>,_,(1/k). We then have 


1 i 


1 
Aon A, = eee eS 
2 ACE Tre a ton = on 


xn= = 
n= 5 

which implies that the sequence (A,,) is not a Cauchy sequence. 
Thus, view of the Cauchy criterion, the harmonic series (2.22) 
diverges. d& 

Determine the convergence of the series 

1 

—. 2.23 

; (2.23) 


iM 
s 


1 


This is called a hyperharmonic series (or zeta function) and is de- 
noted by €(p). 


Solution: When p < 1, a partial sum Ag» consisting of the first 
2” terms reads 


Pe Pee ts 4 bes a 
yt eg ied Creer kal ee ec 
(222% + 1)P (2”)P 


5 eee) ero ante re oreo 
= 2 ga 5 8 
+ | ———— . 


1 ‘ 1 
(27-141) Qn 
FY, a 1 n 
Sot oe ot XA SL SOO pee, 
Se gs Rigo + 5 x 5 


This means that the series (2.23) diverges for p < 1. 
For p > 1, we have 


bi en eee 1 fis, 
See UE | at iag eg hg ee 


elt acts 1 
Gp * Gap 
1 1 1 - 
Sipe erage me en 


n k 
1 | ta fae-ert ge-1 
aaa) ~~ 1 (1/2e-4) S pT 1" 


Hence, the monotonically increasing sequence {A,,} is bounded 
above and is thus convergent. & 


2.3 Infinite Series of Real Numbers 37 


3. Determine the convergence of the series 


CO 7 4)\k41 
ee (2.24) 


Solution: Let n be an even integer, say n = 2m. Then, it follows 
that 


2m 
ea (Hyatt 1 1 1 
Aim =D, ag ON gt gee eet 


k 


94) 43 OnQm—1)’ 


which means that (Azgm) is increasing with m. In addition, we have 


11 11 1 
Aom =1 mre 
2 (5 3) € =) omer 


which indicates that (Azm) is bounded above. Hence, (Agm) con- 
verges toa limit A. Further more, since Agm41 = Agm+1/(2m+1), 
the same discussion as above tells us that the sequence (Aa +1) 
also converges to the common limit A. By applying the result 
from shuffled sequences (see Exercise 3 in Sect. 2.1.2), we find 
that lim A, exists, so the series (2.24) converges. It is thus proven 
that the series converges conditionally. d& 


4. Suppose that the infinite series }7,a;, and 5°, b, are both convergent 
absolutely. Let (a;b;) be an infinite sequence in which the terms a,b; are 
arranged in an arbitrary order, say, as 


(a2by, a b3, azb4, a561,+°- Je 


Show that the sequence of the partial sums of (a;b;) converges absolutely 
regardless of the order of the terms a;bj;. 


Solution: Let m and n be the maximum values of i and 7, re- 
spectively, that are involved in the partial sum )> (4,9) a;b;; here 
(7,7) denotes the possible combinations of i and j that are ar- 
ranged in the same order as in thesequence (a;b;). The partial 


38 2 Real Sequences and Series 


sum is a portion of the product of the finite sums given by 
ae) eae bj). Hence, we have 


S- aib; = is |aid;| = 


(ii) (ii) 


m 
: 
xe 
4 


ae la us 
-1S~b5| < Sail S> [b51- 
j i=l j=l 


2.25 
From hypothesis, the left-hand side in (2.25) converges as - n a 
oo. This means that the partial sum }7(;_;) |aib;| is bounded above. 
In addition, it is obviously increasing. Therefore, 7; ;) |aibj| con- 
verges (i-e., )7(;,;) aibj converges absolutely) independently of the 
order of 7 and 7 in the sequence of (a;b;). & 


5. Show that rearrangements of absolutely convergent series always converge 
absolutely to the same limit. 


Solution: Let )°~°., a% be absolutely convergent and assume 
that S772, 6, is its rearrangement. Define A, = )7>;_, |ax|, A = 
limn—soo An, Bn = 0p_, |be|, and let ¢ > 0. By hypothesis, there 
is an integer N such that |A— An| = |an4i1| + |an42|+--> < §. 
Now we choose the integer MW so that all the terms aj, a2,--- , ay 
appear in the first MM terms of the rearranged series, i.e., within 
the finite sequence (b1, b2,--- , baz). Hence, these terms do not con- 
tribute to the difference B,, — An, where m > N. Consequently, 
we obtain 


€ 
m2N => |Bm— An! S lansil + lanza] +--° <5 


= |A-Bm| <|A—An|+|An — Bml <e, 


which shows that lim,..B, =A. & 


2.4 Convergence Tests for Infinite Real Series 


2.4.1 Limit Tests 


This section covers the important tests for convergence of infinite series. In 
general, these tests provide sufficient, not necessary, conditions for conver- 
gence. This is in contrast to the Cauchy criterion, which provides a neces- 
sary and sufficient condition for convergence, though it is difficult to apply in 
practice. The first test to be shown is called the limit test, by which we can 
examine the absolute convergence of infinite series quite easily. 


2.4 Convergence Tests for Infinite Real Series 39 


@ Limit test for convergence: 
If 
lim k? a, exists for some p > 1, 
k—oo 


then )°°_, a, converges absolutely (and thus converges ordinary). 


Proof By hypothesis, we set limp... k? a, = A for certain p > 1, which implies 
that 
lim k?laz| = |Al. 
k—-oo 
Hence, there exists an integer m such that 
k>m => k?laz|—|A| <1, 
or equivalently, 
|A| +1 
Key © 
We know that the series }°7°.,, 1/k? converges for any p > 1 (see Exercise 


2 in Sect. 2.3). Thus it follows from (2.26) that the series )77-,,, |a,| also 
converges, from which the desired conclusion follows at once. d& 


k>m => |an| < (2.26) 


There is a counterpart of the limit test for convergence that determines 
divergence properties of series as follows. 


@ Limit test for divergence: 
If 
jim kar == 0, 


then 5°, ax diverges. The test fails if the limit equals zero. 


Proof Suppose lim ka, = A > 0. Then there exists an integer m such that 
A 
k>m => ka> 
Hence, by employing the result from harmonic series (see Exercise 1 in 


Sect. 2.3), we obtain 
Co CO 1 
Ya > Jo bao 
k=m 


k=m 
from which the desired result follows. The same procedure can be applied to 
the case of A < 0, in which case the series }*7° ,(—a,) may be treated by the 
procedure above. The proof is thus complete. & 


40 2 Real Sequences and Series 
Remark. 


1. The test is valid even when A goes to infinity. 


2. The divergence test described above is inconclusive when lim ka, = 0. To 
see why, consider the two series 


=i er 
dg and 2 Blog’ 
k=1 k=2 


The former converges and the latter diverges, but both yield lim ka, = 0. 


2.4.2 Ratio Tests 


The following provides another test for absolute convergence of infinite series 
that is sometimes easier to use than the previous one. 


@ Ratio test: 
A series °° 9 ax converges absolutely (and thus converges ordinary) if 


lim sup |] < 1 (2.27) 
k— oo ak 
and diverges if 
lim sup |“*2] > 1. (2.28) 
k— oo ak 


If the limit superior is 1, the test is inconclusive. 


Remark. When |ax+1/a%| converges, the limits superior used in (2.27) and 
(2.28) reduce to the ordinary limits. 


Proof (i) Suppose that ¢ = lim sup 


ee <1. Then, for any r € (¢,1), we 


k—o0o ak 
can find the number m such that 
a 
k>m => |“Hler 
ak 
a Qm+ a d 
It follows that |—“+ mt? lye. x [mt | < rP or equivalently, 
am Am+1 Qm+pt+1 


lQm+tp| <r? |@m|, which holds for any p € N. Hence, we have 


2.4 Convergence Tests for Infinite Real Series Al 


co lo) [oe r 
Ylamsel= > lal < or?leml = 7 alo 
p=1 k=m+1 p=1 


The last term is a finite constant. Therefore, the series )77~,,,41 |ax| re- 
mains finite and the series }>7° , ax converges absolutely. 


(ii) Next we assume that 


lim sup eee nd a be 
k-o00 ak 
Then there is an integer m such that 
kom => |, 
ak 


That is, 
k>m => |ag| > lam| > 0, 


which means that 
lim az, #0. 
k— oo 


In view of the remark in Sect. 2.3.2, the series yea a, diverges. @& 


2.4.3 Root Tests 


We now give an alternative absolute-convergence test based on examining the 
kth root of lag]. 


@ Root test: 
A series }7 7° 9 ax converges absolutely (and ordinary) if 


limsup (/|ax| <1 


k—oo 
and diverges if 


limsup ‘/|ax| > 1. 


k—oo 


If the limit superior is 1, the test fails and does not provide any information. 


Proof Let r=limsup ‘/|ax|. We first prove that the series converges abso- 
k—oo 


lutely if r < 1. We choose a positive number c € (r, 1). Then there is a positive 
integer N such that 


k>N => lagl<c => |ax| <c*. 


42 2 Real Sequences and Series 


Since the geometric series )> c* with ¢ < 1 converges, 5+ |ax| converges, so 
that >> a, converges absolutely. 

When r > 1, it follows from the definition of the limit superior (see 
Sect. 2.1.4) that there are an infinite number of terms of &/ |a,| greater than 
1. This implies that lima, # 0, which means that the infinite series )> ax 
diverges. o& 


Examples Assume the series 


ee 1 1 1 1 1 
Soa, =1 Sige ge Ge ets (2.29) 
k=0 
Since 
1 1 1 
Vi \@oh= Ts Af lea p= 5 V |a2| = mn ¥ |a3| = 5? Vlaal = -5-°:, 
we have 


1 
lim sup ¢/az, = 3 <i. 


k—0o 


Thus the series (2.29) converges (absolutely and ordinary). 


2.4.4 Alternating Series Test 


All the convergence tests presented so far are tests for absolute convergence, 
which assumes ordinary convergence. Nonetheless, certain kinds of series can 
exhibit conditional convergence, i.e., ordinary convergence with absolute di- 
vergence, whose convergence properties cannot be addressed by the tests given 
thus far. Hence, the significance of the test described below, known as the 
alternating series test, is that it may be used to test the conditional con- 
vergence of some absolutely divergent series. 

We say that (z,) is an alternating sequence if the sign of x, is different 
from that of x,41 for every k. The resulting series }> x, is called the alter- 
nating series, whose convergence properties are partly determined by the 
following theorem: 


@ Alternating series test: 
An alternating series given by 


a — a2 +a3—a4+-+-= 4 0(-1)**1a, with a, > 0 for all k 
k=1 


converges if 
ap > Ap41 and lim a, = 0. 
k—oo 


2.4 Convergence Tests for Infinite Real Series 43 


Proof First we show that the sequence of partial sums S,, converges. It follows 
that 
Aon = (a4 = az) + (a3 _ a4) qe tse (dan-1 + don). 


Since a, —axz41 > 0 for all k, the sequence Ag, is increasing. It is also bounded 
above because 


Aon = a1 — (a2 — a3) — (@a — a5) — +++ — (@2an—2 + A2an—1) — Gan < a (2.30) 


for all n € N. Thus, lim Ao, exists and we call it A. On the other hand, we 
have 

|Aon+1 — Al = |Aen@ent1 — Al < |Aan — Al + |@on+1]- 
In the limit as n — ov, the left-hand side vanishes so that we obtain 
lim Agn41 = A. Therefore, we conclude that S,—- S & 


Exercises 


00. (k +1)1/2 


1. Show that 2 (i +B converges. 


Solution: Taking p = 7/6 > 1 into the limit test for convergence, 
we have 


lim k7/6 a, = jim Uae cae ls 


k00 co (1—k-2 + k- mys * 


,logk 
2. Show that y(-1 = converges. 
k=1 


Solution: With use of the limit test for convergence by taking 
p = 3/2, we obtain 


log k 
i 3/2 = li —_ k —t 
jim k°!"ap jim (-1) Tk 


3. Show ey 


Solution: From the limit test for divergence, we have 


k? logk 
pees hap hae 1+k2 — 


& 


44 2 Real Sequences and Series 


(kt)? 
4. Show that Sa ; ; converges. 


Solution: The ratio test yields 


aeti| _ (2k)! [(k +1)!] (k +1) 
a, | (ki)? (2k+2)! ~ (2k +2)(2Qk +1) 
1+) 1 
2 +) i <1 (k = 00). &e 

(2+ _)(2+%) 4 

lee) 1 —k? 

5. Show that ys, (1 + 7) converges. 
k=1 
Solution: The root test yields 
1/k 

i 1 1 
k [L+(1/k))" 


3 


Real Functions 


Abstract Infinite sequences and series of real functions are encountered frequently 
in mathematical physics. The convergence of such sequences and series does not 
generally preserve the nature of their constituents; e.g., a sequence of “continuous” 
functions can converge into a “discontinuous” function. In this chapter, we show 
that this is not true in cases of uniform convergence (Sect. 3.2.2), which is a special 
class of convergence that preserves the continuity, integrability, and differentiability 
of the constituent functions of sequences and series, as we explain in detail in Sects. 
3.2.4-3.2.6. 


3.1 Fundamental Properties 


3.1.1 Limit of a Function 


Having discussed the limits of sequences and series of real numbers, we now 
turn our attention to the limit of functions. Let A be a real number and f(z) 
a real-valued function of a real variable x € R. A formal notation of the above 
function is given by the mapping relation f : R — R. The statement “the 
limit of f(a) at « =a is A” means that the value of f(x) can be set as close 
to A as desired by setting x sufficiently close to a. This is stated formally by 
the following definition. 


@ The limit of a function: 
A function f(a) is said to have the limit A as x — a if and only if for 
every € > 0, there exists a number 6 > 0 such that 


lc—al <6 => |f(z)-—Al <e. (3.1) 


The limit of f(a) is written symbolically as 
lim f(z) =A 


ra 


46 3 Real Functions 


or 


f(a) ~ A for «> a. 


If the first inequality in (3.1) is replaced by 0 < r-—a <6 (or0<a—2 <0), 
we say that f(a) approaches A as x — a from above (or below) and write 


lim f(x) =A (or lim fa) =A). 


x—a+ ra 


This is called the right-hand (or left-hand) limit of f(x). The two together 
are known as one-sided limits. 

A necessary and sufficient condition for the existence of lim,, f(x) is 
shown below. 


@ Theorem: 
The limit of f(x) at « = a exists if and only if 


Ihsan yp ((ae)) = Minn (Ze). (3.2) 


xw—a+ r2—a— 


Proof If limz—.a f(x) exists and is equal to A, it readily follows that 


lim f(z) = im f(a) =A. (3.3) 


xw—a+t 


We now consider the converse. Assume that (3.2) holds. This obviously means 
that both one-sided limits exist at « = a. Hence, given ¢ > 0, we have 6; > 0 
and 62 > 0 such that 


O0<a-a<d, => |f(x)-Al<e, 
O<a-4<dg => |f(x)—Al <e. 


Let 6 = min{61, 62}. If x satisfies 0 < |x — al < 6, then either 
0<x-a<6<6, or O<a-2<6 < dbo. 


In either case, we have | f(a) — A| < ¢. That is, we have seen that for a given 
€, there exists 6 such that 


0<|a-al<d = |f(x)-A| <e. 
Therefore we conclude that 
Equation (3.2) holds => lim f(a) = A, 


and the proof is complete. & 


3.1 Fundamental Properties 47 


3.1.2 Continuity of a Function 


In general, the value of lim,.. f(a) has nothing to do with the value (and 
the existence) of f(a). For instance, the function given by 


tor ci, 


am iy ee 


gives 
lim f(a) =0 and f(1) =2, 


which are quantitatively different from one another. This mismatch occurring 
at « = 1 results in a lack of geographical continuity in the curve of y = f(z), 
as depicted in Fig. 3.1. In mathematical language, continuity of the curve of 
y = f(a) is accounted for by the following statement. 


Fig. 3.1. A discontinuous function y = f(x) at «= 1 


@ Continuous functions: 
The function f(a) is said to be continuous at x = a if and only if for 
every € > 0, there exists 6 > 0 such that 


|jz—al <6 = |f(z)- fla <e. 


Remark. The definition noted above seems to be similar to the definition of the 
limit of f(a) at x = a (see Sect. 3.1.1). However, there is a crucial difference 
between them. When considering the limit of f(a) at « = a, we are only 
interested in the behavior of f(a) in the vicinity of the point a, not just at a. 
However, the continuity of f(x) at « = a requires the further condition that 
the value of f(a) just at « = a has to be defined. In symbols, we write 

f(a) is continuous att~=a => lim of) = lim_ f(x) = f(a). 


La x—a+0 


48 3 Real Functions 


We must emphasize that given a function f(2) on a domain D, the limit of 
f(a) is defined at limit points in D that may or may not lie in D. In contrast, 
the continuity of f(x) is defined only at points contained in D. An illustrative 
example is given below. 


Examples Assume a function given by 
f(a) =a for all but «=1. 
It has a limit at « = 1, 
lim f(z) =1, 


but there is no way to examine its continuity because x = 1 is out of the 
defining domain. 


When f(z) is continuous, we can say that f(a) belongs to the class of functions 
designated by the symbol C. Then, it follows that 


f(a)EeCatr=a — lim f(x) = f(a). 


If the symbol 2 — a appearing in the right-hand statement is replaced by 
x — a+ (or x > a—), f(x) is said to be continuous on the right (or left) 

at x = a. We encounter the latter kind of a limit particularly when we consider 
the continuity of a function defined within a finite interval [a,b]; we say that 


f(x) € C on [a,b] == 
f(x) € C on (a,b) and im, f(x) =f(a), lim f(x) = f(d). 


a—b— 
We also say that a function f(a) on [a,b] is piecewise continuous if 


(i) f(x) is continuous on [a,b] except at a finite number of points 71,72,--- , 
Zn} 

(ii) at each of the points 71, 22,--: ,@n, there exist both the left-hand and 
right-hand limits of f(a) defined by 


f(z~ —9) = gate f(z), f(t, +0) = Pant f (2). 


3.1.3 Derivative of a Function 


The following is a rigorous definition of the derivative of a real function. 


@ Derivative of a function: 
If the limit 
sin ££) 


ra G5 + i0 


exists, it is called the derivative of f(a) at « = a and is denoted by f’(a). 
The function f(z) is said to be differentiable at x = a if f’(a) exists. 


3.1 Fundamental Properties 49 


Similar to the case of one-sided limits, it is possible to define one-sided 
derivatives of real functions such as 


@ Theorem: 
If f(a) is differentiable at 7 = a, then it is continuous at x = a. (The 
converse is not true.) 


Proof Assume x 4 a. Then 
f(x) — f(a) 


x—-a 


f(@)-f@= (x — a). 


From hypothesis, each function [f(x) — f(a)]/(a@ — a) as well as « — a has the 
limit at « = a. Hence, we obtain 
lim [f(«x) — f(a)] = lim _— - lim (a — a) = f'(a) x0 =0. 
Therefore, 
lim f(2) = f(a), 
i.e., f(a) is continuous at « = a. That the converse is false can be seen by 
considering f(x) = |«|; it is continuous at + = 0 but not differentiable. & 


The term C” functions is used to indicate that all the derivatives on the order 
of <n exist; this is denoted by 


f(iz)eCc™ => FM ec. 
Such an f(x) is said to be a C” function or to be of class C”. 
0, «<0 
Examples 1. f(x) = ie r>O 
=> f(z)EeC”(=C), but fiz) ¢C at c=0. 


0 «<0 
2 = 499 So 


=> f(xz)ECc'!, but f(z) ZC? at c=0. 


3. Taylor series expansion for functions f € C™ is given by 


ea OF 


HONS Doak (a — 29)* +0 (|x — aol"). 
<n 4 xL=XO 


k 


50 3 Real Functions 


3.1.4 Smooth Functions 


We now introduce a new class of functions for which the derivative is contin- 
uous over the defining domain. 


@ Smooth functions: 
The function f(x) is said to be smooth for any x € |a, }] if f’(x) exists 
and is continuous on [a, 0]. 


In geometrical language, the above statement means that the direction of the 
tangent changes continuously, without jumps, as it moves along the curve 
y = f(x) (see Fig. 3.2). Thus, the graph of a smooth function is a smooth 
curve without any point at which the curve has two distinct tangents. 
Similar to the case of piecewise continuity, the function f(a) is said to 
be piecewise smooth on the interval [a,b] if f(a) and its derivatives are 
all piecewise continuous on [a,b]. The graph of a piecewise smooth function 
is either a continuous or a discontinuous curve; furthermore, it can have a 
finite number of points (called corners) at which the derivatives show jumps 
(see Fig. 3.2). Every piecewise smooth function f(x) is bounded and has 
a bounded derivative everywhere, except at its corners and points of dis- 
continuity; f’(a) does not exist in the sense of continuity at any of these 


points. 
y, 


(a) (b) 


Fig. 3.2. (a) A continuous function y = f(x). (b) A piecewise smooth function 
y = f(x) having two discontinuous points and one corner 


3.2 Sequences of Real Functions 


3.2.1 Pointwise Convergence 


In this section we focus on convergence properties of sequences consisting of 
real-valued functions of a real variable. Suppose that for each n € N, we have 


3.2 Sequences of Real Functions 51 


a function f,(2) defined on a domain D C R. We then say that we have a 
sequence 

(fn(a): ne N) 
of real-valued functions on D. If the sequence (fn(x)) converges for every 
x € D, the sequence of functions is said to converge pointwise on D, and 
the function defined by 


f(x) = Jim, fa(w) 


is called the pointwise limit of (f,,(x)). The formal definition is given below. 


@ Pointwise convergence: 

The sequence of functions (f,) is said to converge pointwise to f on 
D if, given € > 0, there is a natural number N = N(e,) (which depends 
on € and x) such that 


me = |e) = 7@)| <e: 


0 x ° 


Fig. 3.3. Converging behavior of f,(x) = x” given in (3.4) 


Examples Assume a sequence (f,,) consisting of the function 
fn(“) = 2" (3.4) 


that is defined on a closed interval [0, 1]. It follows that the sequence converges 
pointwise to 


52 3 Real Functions 


0 forO<a2 <1, 


f(x) = lim f(x) = . an ae (3.5) 


See Fig. 3.3 for the converging behavior of f,(a) with increasing n. 

The important point is the fact that under pointwise convergence, conti- 
nuity of functions of f,(x) is not preserved. In fact, fn(a) given in (3.4) is 
continuous for each n over the whole interval [0,1], whereas the limit f() 
given in (3.5) is discontinuous at « = 1. This indicates that interchanging 
the order of the limiting processes under pointwise convergence may produce 
different results, as expressed by 

lim lim f,(#) A lim lim f(a). 

x—1n—-oo no 21 
Similar phenomena might occur in connection with, integrability and differ- 
entiability of terms of functions f,(x). That is, under pointwise convergence, 
the limit of a sequence of integrable or differentiable functions may not be 
integrable or differentiable, respectively. Illustrative examples are given in 
Exercises 1 and 2 in Sect. 3.2. 


3.2.2 Uniform Convergence 


We know that if the sequence (f,(2)) is pointwise convergent to f(x) on 
x € D, it is possible to choose N(x) for any small ¢ such that 


m>N(«) => |fm(x) — f(x)| <e. (3.6) 


In general, the least value of N(x) that satisfies (3.6) will depend on x. But in 
certain cases, we can choose N independent of x such that | fimn(a)— f(x)| <«€ 
for all m > N and for all x over the domain D. If this is true for any small 
é, the sequence (f,,(x)) is said to converge uniformly to f(x) on D. The 
formal definition is given below. 


@ Uniform convergence: 

The sequence (f,,) of real functions on D C R converges uniformly 
to a function f on D if, given ¢ > 0, there is a positive integer N = N(e) 
(which depends on ¢€) such that 


n>N => |fn(x)— f(«)| <e for alla € D. 


Emphasis is placed on the fact that the integer N = N(e,x) in the point- 
wise convergence depends on x in general, whereas N = N(e) in the uniform 
convergence is independent of x. Under uniform convergence, therefore, by 


3.2 Sequences of Real Functions 53 


taking n large enough we can always force the graph of y = f,(a) into a band 
of width less than 2¢ centered around the graph of y = f(a) over the whole 
domain D (see Fig. 3.4). 


Fig. 3.4. A function y = f,(x) contained overall within a band of width less than 
2 


The definition of uniform convergence noted above is equivalent to the 
following statement. 


@ Theorem: 
The sequence (f,,) of real functions on D C R converges uniformly to f 
on D if and only if 


sup |fn(z) — f(a)| =0 as n— ov. 
#eD 


3.2.3 Cauchy Criterion for Series of Functions 
As in the case of real sequences, the Cauchy criterion is available for testing 


uniform convergence for sequences of functions. 


@ Cauchy criterion for uniform convergence: 
The sequence of f,, defined on D C R converges uniformly to f on D if 
and only if, given e > 0, there is a positive integer N = N(e) such that 


mn>N => |fm(x) — fn(x)| <e for all xe D, (3.7) 
or equivalently, 


ii SIN =s so lie) — iF) <2. 
xED 


54 3 Real Functions 


Proof Suppose that f;,(a) converges uniformly to f(a) on D. Let ¢ > 0 and 
choose N € N such that 


n>N => Ifn(z) — f(2)| < = for all x € D. 


If m,n > N, we have 


lfn(x) — fm(x)| < | fala) — f(a)| + | f(a) — fn(a)| <¢ for all x € D. 


This result implies that if f,(#) is uniformly convergent to f(a) on D, there 
exists an N that satisfies (3.7) for any small e. 

Next we consider the converse. Suppose that (jf;,) satisfies the criterion 
given by (3.7). Then, for each point of « € D, (fn(x)) forms a Cauchy sequence 
and thus converges pointwise to 

f(x) = lim fn(x) for all x € D. 
We now show that this convergence is uniform. Let n > N be fixed and take 
the limit m — co in (3.7) to obtain 


n>N => |frl(x)— f(a)| <e for alla € D, 


where N is independent of x, from which we conclude that the convergence 
of (fn) to f is uniform. & 


3.2.4 Continuity of the Limit Function 


The most important feature of uniform convergence is that it overcomes some 
of the shortcomings of pointwise convergence demonstrated in Sect. 3.2.1; ie., 
pointwise convergence does not preserve continuity, integrability, and differen- 
tiability of terms of the functions f,(a). We now examine the situation under 
uniform convergence, starting with the continuity of f,(2). 


@ Theorem: 
If f, converges uniformly to f on D C R, then, if f, is continuous at 
c€ D, sois f. 


Remark. Note that the uniform convergence of f;,, on D is a sufficient, but not 
a necessary, condition for f to be continuous. In fact, if fn is not uniformly 
convergent on D, then its limit f may or may not be continuous at c € D. 


For the proof, it suffices to see that 


lim f(z) = lim lim f,(x) = lim lim f,(a) = lim fn(c) = f(c), (3.8) 


wc LC NCO nO L>C n—- co 


3.2 Sequences of Real Functions 55 


which guarantees the continuity of the limit function f(x) at « =. In (3.8), 
we have used the interchangeability of limiting processes expressed by 


lim lim f,(2) = lim lim f,(2), 


LC NOOO nO L>C 


which follows from the lemma below. 


@ Lemma: 
Let c be a limit point of D C R and assume that f,, converges uniformly 
to f on D\{c}. If 
Ihvont ff.A(@e)) = Lp (3.9) 


@rz—c 


exists for each n, then 
(i) (én) is convergent, and 
(ii) lim... f(x) exists and coincides with limp —oo Cn; i.e., 


liven, Ibvon jf (ae) = lbton Ivan jf, ((@e). (3.10) 


NOOO LC «wL—-C NCO 


Proof Let ¢ > 0. Since (fn) converges uniformly on D\{c}, it satisfies the 
Cauchy criterion; i.e., there is a positive integer N such that 


mna>N => |frlx)— fm(x)| <e for alla € D\{c}. (3.11) 
Take the limit « — c in (3.11) to obtain 
mn>N => |e, —bm| <e. (3.12) 


This implies that (¢,,) is a Cauchy sequence and thus convergent, which proves 
statement (i) above. 


To prove (ii), let 
£= lim @,. 


Set n = N and m — o in (3.9), (3.11), and (3.12) to set the following results: 


lim fiv(2) = én, (3.13) 
|fn(x) — f(x)| <e for all x € D\{c}, (3.14) 

and 
ln —£| <e. (3.15) 


In addition, the existence of (3.13) implies that there exists a 6 > 0 such that 


jvc —c| < d with € D\{c} => |fn(ax) — ln| <e. (3.16) 


56 3 Real Functions 


Using (3.14), (3.15) and (3.16), we obtain 
ja —c| <6 with x € D\{c} 
= |f(z)— 4 <|f(2) — fr()| + lf (2) — n+ len — 4] < 3e. 


This means that 
lim f(x) = £, 


zrz—c 


which is equivalent to the desired result of (3.10). & 


Remark. The contraposition of the theorem tells us that if the limit function 
f is discontinuous, the convergence of f,, is not uniform. The example in 
Sect. 3.2.1 demonstrated such a sequence. 


3.2.5 Integrability of the Limit Function 


We know that the limit function f(x) becomes continuous if the sequence 
(fn(a)) of continuous functions is uniformly convergent. This immediately 
results in the following theorem. 


@ Theorem: 


Suppose f, be integrable on [a,b] for each n. Then, if f, converges 
uniformly to f on [a, 0], the limit function f is also integrable, so that 


| ; f(a)de = lim, i fu(a)da, (3.17) 


or equivalently, 


b b 
/ hie jpllanole = iim | fr(a)dx. 


Proof Since f, for every n is integrable on [a,b], it is continuous (piecewise, 
at least) on [a,b]. Thus f(a) is also continuous (piecewise at least) on [a,b] in 
view of the theorem given in Sect. 3.2.4, so that f(a) is integrable on {a, 0]. 
Furthermore, we observe that 


: poe | “ee 


b 
< i. fale) — f(a)| dx 
b 


< sup |fn(a) — f(a)| de 


x€ [a,b] 


< (b—a) sup |fn(2) — f(2)]- 


«€ [a,b] 


3.2 Sequences of Real Functions 57 


The uniform convergence of (f;,) ensures that 


sup |fn(x) — f(x)| +0 as n — oo, 
x€ [a,b] 


which immediately gives the desired result shown in (3.17). & 


Remark. 


1. Note again that uniform convergence is a sufficient but not a necessary 
condition for (3.17) to be valid, so (3.17) may be valid even in the absence 
of uniform convergence. For instance, the convergence of (f,) with f,(x) = 
x” on [0, 1] is not uniform but we have 


[ oerae= fora = 5 0= [sear 


2. The conditions on f, stated in the theorem will be significantly relaxed 
when we take up the Lebesgue integral in Chap. 6. 


3.2.6 Differentiability of the Limit Function 


After the last two subsections, readers may expect that results for differentia- 
bility will be similar to those for continuity and integrability; i.e., they may be 
tempted to conclude that the differentiability of terms of functions f,(2) will 
be preserved if (f,) converges uniformly to f. However, this is not the case. 
In fact, even if f,, converges uniformly to f on [a,b] and f,, is differentiable 
at c € [a,b], it may occur that 


lim fn(c) # f'(e). 
Consider the following example: 


Examples Suppose the sequence (f,,) is defined by 


fr{z) = 4/0? + > x €[-1,1]. (3.18) 


Clearly (3.18) is differentiable for each n, and the sequence (f,) converges 
uniformly on [—1, 1] to 


f(z) = |2| (3.19) 
since 
Minx) — f(2)| = fa? + — VF 
1 
= n? < : 0, for all x € [-1,]]. 


58 3 Real Functions 


However, the limit function f of (3.19) is not differentiable at « = 0. Hence, 
the desired result 
lim fi(x) = f(z) (3.20) 
breaks down at x = 0. 
The following theorem provides sufficient conditions for (3.20) to be sat- 
isfied. The important point is that it requires the uniform convergence of the 
derivatives f/, not of the functions f,, themselves. 


@ Theorem: 

Suppose (f;,) to be a sequence of differentiable functions on [a, 6] that 
converge at a certain point xo € [a,b]. If the sequence (f/,) is uniformly 
convergent on [a,b], then 


(i) (fn) is also uniformly convergent on [a,b] to f, 
(ii) f is differentiable on [a, b], and 
Gat) Weiss ce, Gr (48) = af (Ge). 


Proof Let ¢ > 0. From the convergence of (fn(ao)) and the uniform conver- 
gence of (f/,), we conclude that there is an N € N such that 


mn>N => |f,(x) — fp(2)| <e for all x € [a,)] (3.21) 


and 
mn>N => |fn(xo) — fm(xo)| <e. (3.22) 


Given any two points x,t € [a,b], it follows from the mean value theorem 
applied to fn — fm that there is a point c between x and t such that 


Fn(2) — f(x) — [fn(t) — fm(é)] = (@ — ) [Fi() — fn (o)] - 
Using (3.21), we have 


m,n>N = |fn(2) — fm(®) — [fn(t) — fm(O)]] < ele — ¢]. (3.23) 
From (3.22) and (3.23), it follows that 
|fn() — fm(#)| S |fn(a) — fm(®) — [fn (to) — fm(#0)]| + |fn(®0) — fm (20) 


< elu —ap| +e 
<e(b—a+1) =Ce, for all x € [a,b], 


Which means that (f,) converges uniformly to some limit f. Hence, statement 
(i) has been proven. 


3.2 Sequences of Real Functions 59 


Next we consider the proofs of (ii) and (iii). For any fixed point x € [a, 0], 


define 

a(t) = POSE) | be fa,o\ta} 
and 

g(t) = FO FO be fa, y\ oh. 


Clearly, fn — g as n — oo; furthermore, if m,n > N, the result of (3.23) tells 
us that 
lfn(t) — fm(t)| <e for all t € [a, b]\{x}. 


Thus in view of the Cauchy criterion, we see that f, converges uniformly to 
g on [a,b]\{a}. Now we observe that 
lim fr(t) = fi (2) for allne N. (3.24) 


Then, uniform convergence of f,, ensures taking the limit of n — oo in (3.24) 
followed by interchanging the order of the limit processes, which yields 


fel P= nee ie 


n—0o tx toa toa t—-—2 n—0o 


This proves that f is differentiable at 7 and that 


f'(#) = lim f(x). & 
n—-0o 
Remark. That the uniform convergence of (f/,) is just sufficient, not necessary, 
is seen by considering the sequence 


grt 


fn(x) = met? LE (0,1). 


This converges uniformly to 0, and its derivative f/ (x) = x” also converges to 


0. The conclusions (i)—(iii) given in the theorem above are thus all satisfied. 
But the convergence of (f’) is not uniform. 


Exercises 
1. For the function 
fnlz) =n — 27)", «xe ([0,1], 


check that an interchange of the order of the limiting process n — oo 
and integration gives different results. 


60 3 Real Functions 


Solution: The given function is integrable for each n so that 


1 


: : 2 ae 2\nt1 
[ falayae =n f xa(1—2«*)"dx = Cee F 
n 1 
~2n+1) 2° 


On the other hand, the limit given by 


f(z) = Jim, fn(x) =0 for all x € [0,1] 


yields fo f(x)dx = 0. We thus conclude that 


n—- oo 


1 1 
lim n(x)dx lim fn(x)dax; 
Palen # f lima fala) 


i.e., interchanging the order of integration and limiting processes 
is not in general allowed under pointwise convergence. @& 


2. For fn(ax) given by 


-1 a<-—t, 
fn(z) = § sin ("$£) -i<a<l, 
1 xz> 1, 


check the continuity of its limit f(x) = limp fr(x) at « =0. 


Solution: f;,(x) is differentiable for any x € R for all n, and thus 
is continuous at x = 0 for all n. However, its limit, 


-l2z<0, 
f(@)=40 x=0, 
1 «>0 


is not continuous at « = 0. Hence, for the sequence of functions 
{fn(x)}, the order of the limiting process n — oo and the differ- 
entiation with respect to x is not interchangeable. & 
3. Show that the sequence of functions (f,(x)) defined by 
fn(z) = nae“”” (3.25) 


converges uniformly to f(a) =0 on a > 0. 


3.3 Series of Real Functions 61 
Solution: In view of the previous theorem, we show that 
sup{ f(z): > a}=0 as n> ~w, 
where a > 0. To prove this, we consider the derivative 
fn (x) = ne“™*(1 — nz). (3.26) 


It follows from (3.26) that 2 = 1/n is the only critical point of f,. 
Now we choose a positive integer N such that a > 1/N. Then, the 
function f, for each n > N has no critical point on x > a, and 
is monotonically decreasing. Therefore, the maximum of f,(x) is 
attained at x =a for any n > N, with the result that 


sup fn(z) = fr(a) = nae~"* = 0 (n> oo). 


x€[a,co) 


This holds for any a > 0; hence, we conclude that f, converges 
uniformly to 0 on (0,00), ie, onz>0. & 


Remark. Note that the range of uniform convergence of (3.25) is the open 
interval (0,00), not the closed one [0,00). Since in the latter case we have 


sup til®) = fn (+) = : fp 0, 


xE[0,co) 


it is clear that (fp) does not converge uniformly on [0, 00). 


3.3 Series of Real Functions 


3.3.1 Series of Functions 


We close this chapter by considering convergence properties of series of real- 
valued functions. Assume a sequence (f,,) of functions defined on D C R. By 
analogy with series of real numbers, we can define a series of functions by 


DK 

3 

S 

II 
4: 
= 
oe 

8 

Mm 
S 


k=1 


which gives a sequence (5;,) = ($1, S2,---). 

As n increases, the sequence (S,,) may or may not converge to a finite 
value, depending on the feature of functions f;,(x) as well as the point x 
in question. If the sequence converges for each point x € D (i.e., converges 


62 3 Real Functions 


pointwise on D), then the limit of S,, is called the sum of the infinite series 
of functions f;,(a) and is denoted by 


S(z) = lim S,(z)=>_ fr(w), 2 € D. 
k=1 


n—- co 


It is obvious that the convergence of the series S;,(x) implies the pointwise 
convergence limp—oo fn(z) = 0 on D. A series (S,,) that does not converge at 
a point x € D is said to diverge at that point. 

Applied to series of functions, the Cauchy criterion for uniform convergence 
takes the following form: 


@ Cauchy criterion for series of functions: 
The series S,, is uniformly convergent on D if and only if for every small 
€ > 0, there is a positive integer N such that 


n>m>N 


<e forallxeD. 


Se 


k=m-+1 


= |Sn(@) — Sm(x)| = 


Set n = m-+ 1 in the above criterion to obtain 
n>N => |fpr(x)| <e for all x € D. 


This results implies that the uniform convergence of f,(z) —~ 0 on Disa 
necessary condition for the convergence of S;,(x) to be uniform on D. We will 
use this theorem when proving a more practical test for uniform convergence 
known as the Weierstrass M/-test, which is presented in Sect. 3.3.3. 


3.3.2 Properties of Uniformly Convergent Series of Functions 


When a given series of functions 5> f;,(a) is uniformly convergent, the proper- 
ties of the sum $(x) in terms of continuity, integrability, and differentiability 
can be easily inferred from the properties of the separate terms f;(). In fact, 
applying the theorems given in Sects. 3.2.4-3.2.6 to the sequence (S,,) and 
using the linearity regarding the limiting process, integration, and differenti- 
ation, we obtain the parallel theorems shown below. 


@ Continuity of the sum: 
Suppose f;,(x) to be continuous for each k. If the sequence (.S,,) of the 
series 


Sn(z) = )> fe(x) 
k=1 


3.3 Series of Real Functions 63 


converges uniformly to S(a), then S(z) is also continuous, so that 


lim S(t) = lim }7 fie(t) = 7 lim fe(@). 
(il 


t£ 
ka 


@ Integrability of the sum: 
Suppose fj, to be integrable on [a, }] for all &. If (S,,) converges uniformly 
to S on [a,b], we have 


[sew=[¥ d_Si(e TEE 


k=1- @ 


@ Differentiability of the sum: 

Let fj, be differentiable on [a,b] for each k and suppose that (.S;,) con- 
verges to S at some point xo € [a,b]. If the series )> ff is uniformly con- 
vergent on |[a, 6], then S,,(x) is also uniformly convergent on [a,b] and the 
sum S(x) is differentiable on [a,b], so that 


+S(2) = = bs 7a) = oS eel for all x € [a, }]. 


eal 


Observe that the second and third theorems provide a sufficient condition for 
performing term-by-term integration and differentiation, respectively, of an 
infinite series of functions. Without uniform convergence, such term-by-term 
calculations do not work. 


3.3.3 Weierstrass M-test 


The following is a very useful and simple test for the uniform convergence of 
a series of functions. 


@ Weierstrass M test: If there is a sequence of positive constants M; 
for any x on the interval [a, 6] such that 


|fx(x)| < Me (3.27) 

and if the series xe 
So Mp (3.28) 

k=0 


converges, then the series of functions )°7° 9 fi(x) converges uniformly on 
x € [a,b]. 


64 3 Real Functions 


Proof Since (3.28) converges, it follows from the Cauchy criterion that for any 
€ > 0 there exists a number N such that 


Se — Soa 
k=0 k=0 


n>m>N => 


=r Mee & (3.29) 
k=m 


Furthermore, in view of the inequality rule for absolute values of sums and 
the relation (3.27), it follows that 


S> fe(x) 
k=m 


<0 lfe@)l< SS Me (3.30) 
k=m 


k=m 


for all x € [a,b]. Note that the left-hand term in (3.30) can be rewritten as 


yi) 
k=m 


From (3.29), (3.30), and (3.31), it follows that 


n m 


So fe(x) — So fa(a)}. 


k=0 k=0 


(3.31) 


n>m>N => S> f(z) — S© f(a) <e for all x € [a, }], 
k=0 k=0 


which clearly indicates the uniform convergence of )> f;,(a) on [a,b]. & 


Exercises 
Co 

1. Determine the convergence of the series S- Des 
k=0 


Solution: It obviously converges to 1/(1 — x) on the interval 
[—a, a] if 0 <a < 1. We show that this convergence is uniform on 
[—a, a] for any 0 < a < 1. A partial sum yields S,,(x) = 0¢_) v* = 
(1—2«")/(1 — 2), so that 


|x|” qa” 


|l—a|~ l-a 


for |x| <a. 


Since 0 < a < 1, the last term decreases monotonically with n; 
hence, for a given ¢ > 0, we can find an N suchthatn >N => 
a”/(1 —a) < e€. Clearly the value of N does not depend on «. 
Therefore, we conclude that the infinite series )> x* is uniformly 
convergent on [—a,a] withO<a<l. & 


2. Determine the convergence of the series pac! —x)x”. 


3.3 Series of Real Functions 


lee) 
k 


k=0 


Solution: This converges to 


1, for0<a<l 
s@={0n r=l1 


but not uniformly. Actually, we have 


and if ¢ = 1/4, for instance, the inequality 7” < 1/4 (0<a<1) 
is false for every fixed n because x" > lasx—l. & 


Co 


65 


3. Examine the uniform convergence of the series Se fe(x), where 


k=1 


(i) fe(z) = SF", (ii) fa(w) = sin (f), and (iii) f(x) = gaye. 


Solution: 


(i) 


(ii) 


(iii) 


The series converges uniformly for every real x. Check this 
by taking M;,, = 1/k?. 


Let D be a subset of R bounded by «, i.e., |z| < ¢ for all 
x € D. Then we have 


sin(S)|< = —— <5 for all a € D. 


Taking M;, = c/k? and noting that S> M;, is convergent, we 
conclude that 5° f, is uniformly convergent on any bounded 
subset of R. Notably, however, this uniform convergence dis- 
appears when we extend the domain D to the whole R. This 
is seen by noting that f;, — 0 pointwise on R, but 


sin(* WP) |=1 40, 


which means that the convergence of (f;,) to 0 is not uniform 
on R. In view of the theorem in Sect. 3.3.1, therefore, the 
series )> f;, fails to converge uniformly on R. 


sup | fi:(x)| 2 
ceR 


The series 5+, 1/(k?x*) clearly converges pointwise on the 
open set R\{0}. Now let c > 0. For all € Rsuch that || > c, 
we have |f;,(x)| < 1/(k?c”) for all k. Since 5°, 1/(k?c?) is 


66 3 Real Functions 


convergent, the series 5> f; converges uniformly, by the M/- 
test, on the closed set R\(—c,c) = (—co, —c] U [c, 00) for all 
c > 0. But, although f, — 0 pointwise on R\{0}, we have 


sup,zo |fr(2)| > |fe(1/k)| = 1 A 0. Hence, (f,) does not 
converge uniformly to 0 on R\{0}, so the series > f, does 
not converge uniformly on R\{O}. & 


3.4 Improper Integrals 


3.4.1 Definitions 


Suppose that a given function f(a) is integrable on every open subinterval 


of (a,b). We try to perform the integration i f(x)dx under the following 
conditions: 


1. f(a) is unbounded in a neighborhood of x = a or « = b. 
2. The interval (a,b) itself is unbounded. 


In Case 1, we define a definite integral, 
b x 
[ sear = lim, [ fla)ae, 


if f(a) is bounded and integrable on every finite interval (a, X) fora < X <b. 
Similarly, if f(a) is bounded and integrable on every (X,b) fora < X <b, we 


can define ; : 
| f(x)dz = lim, f f(a)da. 


These definite integrals are called improper integrals. Straightforward ex- 
tensions of these results to Case 2 yields the other improper integrals: 


i Fla)de= tim fide 


and ‘ ; 
e. f(a)dx = jim he f(a)da. 


lo) 
d. 
Examples 1. The improper integral | . has the value 1 since 
1 2 


4 
d. 
2. The improper integral : sad has the value 1 since 


0 Va 
4 4 _ 
dx i / dx i 2 ME 
€ 


0 VE e>+0 ~ 


Jr 7 eet 2 


3.4 Improper Integrals 67 


3.4.2 Convergence of an Improper Integral 


An improper integral over f(a) is said to converge if and only if the corre- 
sponding limit exists. Furthermore, it is said to converge absolutely if and 
only if the corresponding improper integral over |f(a)| converges. (Keep in 
mind that absolute convergence implies convergence in the ordinary sense.) 
A convergent improper integral that does not converge absolutely is condi- 
tionally convergent. 

An improper integral sik f(x,y)dx converges uniformly on a set S of 
values of y if and only if the corresponding limit converges uniformly on S. A 
relevant theorem is given below. 


@ Continuity theorem 
If f(x,y) is a continuous function, then ibe f(z,y)dx is a continuous 
function of y in every open interval where the integral converges uniformly. 


3.4.3 Principal Value Integral 


Suppose that a bounded or unbounded open or closed interval, (a,b) or [{a, 6], 
contains a discrete set of points x = c1,¢2,---, such that f(x) is unbounded in 
a neighborhood of « = c; (i = 1,2,---). Then, the integral ahs f(a)da may be 
defined as a sum of improper integrals, introduced in the previous subsection; 
Le., 


b X2 
) f(a)dz = Peevey vic )dx + xm, . f(a)da (a<c<b), (3.32) 
b i 
| f(a)dz = pag ‘ f(a)dx + lim, fa f(a)dza (a<c<b), 
(3.33) 
foe) ; c : X2 
i f(a)dz = re - f(a)da + pee : f(a)dx (3.34) 


if the limits exists. 
Even though the integrals (3.32), (3.33) and (3.34) do not exist, the limits 
of integrals 
xX 


c—6 b 
lim , Pe: )dx and lim i f(a)da + flo 


zw—0o 


68 3 Real Functions 


may exist. If any of these limits exist, the corresponding integral, (3.32), (3.33) 
or (3.34), is necessarily equal to its principal value integral (see Sect. 9.4.1). 


3.4.4 Conditions for Convergence 


In what follows we give the convergence criteria for improper integrals of the 


form 
i  HNée 


[i@e- ifs [ se@ee 


and 


X—b—0 


We assume that f(a) is bounded and integrable on every bounded interval 
(a, X) that does not contain the upper limit of integration. 


@ Cauchy’s test (= necessary and sufficient conditions for conver- 
gence): 

The improper integral ee f(x)dx converges if and only if for every pos- 
itive real number ¢, there exists a real number M > a such that 


X2 


My SIG, SS (a)da] <e. 


xX, 


Similarly, [°° f(a)dx converges if and only if for every positive ¢, there 
exists a positive 6 < b—a such that 


X2 


f(a)dx 


Xi 


b— Xo <b0-X, <6 = <a er 


Necessary and sufficient conditions for an improper integral to converge uni- 
formly are stated below. 


@ Weierstrass test 

The improper integral ita f(a, y)dx for 1 f(a, y)dx] converges uni- 
formly and absolutely on every set S of values of y such that | f(a, y)| < g(x) 
on the interval of integration, where g(x) is a real comparison function 


whose integral {-* g(x)dz [or ile g(x)dx, respectively] converges. 


3.4 Improper Integrals 69 
Exercises 
°° sing 
1. Show that the integral i, ——dzx converges. 
x 


Th 


Solution: We have 


A: A 
sin x —cosx cos x 
/ dx < / 7 dz, 
caer x a x (OU 
so that 


[Bal stds fS ele Come ee 
By OC vA a A) *" 


This completes the proof. & 


| sin x 


2. Show that i; 
1 


| dx diverges. 


Solution: It follows that 


(n+1)r Tw oo 1 T 
/ dx = :) ae dx > | sin xdx 
nT 0 nt + x (n+1)r 0 
_ 2? 5? ‘ines da 
Wie tng 2 
Hence, for n > 1 we have 


i 
TT 


3. Suppose that f(a) is continuous within an interval (a,b] and diverges at 
b 


x =a Prove that | f(x)dx converges if (x —a)?|f(x)| is bounded on the 


sin x 


x 


sin x 


x TT x 


Cae hse 1 
dx > | a = —log(n+1) > ~, (n>). & 
2 TT 


interval for 0 < p < i 


Solution: We assume that there is an appropriate positive num- 
ber M such that 


(a — a)?|f(a)| <M for all x € (a, 8]. 
Then we obtain 


‘ viet (eae) 
[. IF(@) Ide . ve ate (x ~ a)? - “ | 1 —p ate 


= [0-27] < FO - a 


(since 1— p> 0). (3.35) 


70 3 Real Functions 


Note that the integral on the left-hand side of (3.35) is mono- 
tonically increasing with decreasing ¢, since |f(x)| > 0 over the 
integration interval. Yet it is bounded from above, as proved in 
(3.35). Hence, we conclude that the given integral is convergent 


(absolutely). 


4. Suppose that f(x) is continuous within [a,oo) and that «?|f(x)| is 


b 
bounded there for p > 1. Show that the integral / f(x)dx converges. 


Solution: It follows from hypothesis that there is a positive 
number M such that 


x|f(a)| <M foralla >a. 
Hence, we have for any X >a, 


xX x xX 
[ lf@lar<m f <-— | Ms noes 


p—1 |ap-l p—lap-l’ 


a 


which completes the proof. & 


Part II 


Functional Analysis 


A 


Hilbert Spaces 


Abstract A Hilbert space is an abstract vector space with the following two prop- 
erties: the inner product property (Sect. 4.1.3), which determines the geometry of 
the vector space, and the completeness property (Sect. 4.1.6), which guarantees the 
self-consistency of the space. Most of the mathematical topics covered in this volume 
are based on Hilbert spaces. In particular, L? spaces and I? spaces (Sect. 4.3), which 
are specific classes of Hilbert spaces, are crucial for the formulation of the theories 
of orthonormal polynomials, Lebesgue integrals, Fourier analyses, and others, as we 
discuss in subsequent chapters. 


4.1 Hilbert Spaces 


4.1.1 Introduction 


This section provides a framework for an understanding of Hilbert spaces. 
Plainly speaking, Hilbert spaces are the generalization of familiar finite- 
dimensional spaces to the infinite-dimensional case. In fact, the geometric 
structure of Hilbert spaces is very similar to that of ordinary Euclidean geom- 
etry. This analogy comes from the fact that the concept of orthogonality can 
be introduced in any Hilbert space so that the familiar Pythagorean theorem 
holds for elements involved in the space. Moreover, owing to its generality, 
a large number of problems in physics and engineering can be successfully 
treated with a geometric point of view in Hilbert spaces. 

As we shall see later, Hilbert spaces are defined as a specific class of vector 
spaces endowed with the following two properties: inner product and com- 
pleteness. The former property leads to a rich geometric structure and the 
latter enables us to describe an element in the space in terms of a set of or- 
thonormal bases. These facts result in the possibility of establishing a wide 
variety of complete orthonormal sets of functions in Hilbert spaces; 
we discussed this point in detail in Sects. 5.1 and 5.2. For a better under- 
standing of subsequent discussions, we provide all necessary definitions in 


74 4 Hilbert Spaces 


this section, and then describe several important consequences relevant to an 
understanding of the nature of Hilbert spaces. 


4.1.2 Abstract Vector Spaces 


In order to make this text self-contained, we first give a brief summary of the 
definition of vector spaces. A more precise description of vector spaces and 
some related matters will be provided in Sect. 4.2.1. 


@ Vector spaces: 
A vector space V is a collection of elements called vectors, which we 
denote by a, y,---, that satisfy the following postulates: 
1. There exists an operation (+) on the vectors x and y such that #+y = 
y +a, where the resultant quantity y+ a also must be a vector. 
2. There exists an identity vector (denoted by 0) that yields x +0 = a. 
3. For every x € V, there exists a vector ax € V in which ais an arbitrary 
scalar (real and complex). In addition, 


a(Ba) = (afB)ax, l(a) =@ for all a, 
a(a+y)=axr+ay, (a+ B)x =ax + Ba. 


Emphasis is placed on the fact that vector spaces are not limited to a set 
of geometric arrows embedded in a Euclidean space (see Sects. 4.1.3 and 
19.2.3); rather, they are general mathematical systems that have a specific 
algebraic structure. Several examples of such abstract vector spaces are given 
below. 


Examples 1. The set of all n-tuples of complex numbers denoted by 


i (ip ase »En) 


forms a vector space if the addition of vectors and the multiplication of a 
vector by a scalar are defined by 


E+ry= (Eig cayoee 1$n) Esai? tin) 
=. (1 + m, 2+ 72,°°° bree Ne) 
at = a(&1, 2, °°* send = (a1, a€2,-°- ,aEn). 


2. The set of all complex numbers {z} forms a complex vector space (see 
Sect. 4.2.1), where z; + z2 and az are interpreted as ordinary complex 
numerical addition and multiplication, 

3. The set of all polynomials in a real variable x, constituting the set 
{1,x,a7,23,---}, with complex coefficients is a complex vector space if 
vector addition and scalar multiplication are the ordinary addition of two 
polynomials and the multiplication of a polynomial by a complex number, 
respectively. 


4.1 Hilbert Spaces 75 
4.1.3 Inner Product 


The structure of a vector space is enormously enriched by introducing the 
concept of inner product, which enables us to define the length of a vector 
in a given vector space or the angle between the two vectors involved. 


@ Inner product: 
An inner product is a scalar-valued function of the ordered pair of vectors 
x and y such that 
1. (,y) = (y,@)*. 
2. (ax+ By, z) = a*(x, z) + B*(y, z), where a and @ are certain complex 
numbers. 
3. (x, x) > 0 for any x; (a,x) = 0 if and only if « = 0. 
Here, the asterisk («) indicates that one is to take the complex conjugate. 


Remark. Vector spaces endowed with an inner product are called inner prod- 
uct spaces. In particular, a real inner product space is called a Euclidean 
space and a complex inner product space is called a unitary space. 


The algebraic properties 1 and 2 are in principle the same as those governing 
the scalar product in ordinary vector algebra in a real vector space. The 
only property that is not obvious is that in a complex space, the inner product 
is not linear, but rather conjugate linear with respect to the first factor, i.e., 


(ax, y) = a*(a,y). 


Examples 1. The simplest, but an important, example of an inner product 
space is the space, denoted by C, that consists of a set of complex numbers 


{21, 22,°++ , Zn}. For two vectors w = (1, €9,---€,) and y = (m1, 2,-°- In) 
on C’,, the inner product is defined by 


is? 


2. Suppose that f(x) and g(x) are polynomials in the complex vector space 
defined on the closed interval « € [0,1]. They then constitute an inner 
product space under the inner product defined by 


(f.9) = i f(2)* g(a) w(a)de, 


where w(x) is a weight function. The weight function becomes impor- 
tant when defining the inner product of polynomials, which is treated in 
Chap. 5. 


76 4 Hilbert Spaces 


3. If @ = [&,,63,€4] and y = [m1, 72,173,174] are column four-vectors 
having real-valued elements, then the quantity 


(x,y) = 1m + €2n2 + &3n3 — Ear (4.1) 


satisfies requirements 1 and 2 for an inner product, but not 3 since the 
quantity (x, x) is not positive-definite. Thus the entity (4.1) is not an inner 
product, but it plays an important role in the theory of special relativity. 


For a complex vector space, the inner product is not symmetrical as it is in 
a real vector space. That is, (x,y) 4 (y, x) but rather (x,y) = (y,x)*. This 
implies that (a, a) is real for every x, so we can define the length of the vector 
ax by 

[eel = (@,)¥??. 
Since (#,a) > 0, ||a|| is always nonnegative and real. The quantity ||z]| is 
referred to as the norm of the vector a. Note also that 


1/ 


* 2 
llax|| = (aw, ax)'? = [a*a(a, x)]'/” = lal - |||]. 


Remark. Precisely speaking, the quantity ||ax|| introduced above is a special 
kind of norm that is associated with an inner product; in fact, the norm was 
originally a more general concept that was independent of the inner product 
(see Sect. 4.2.2). 


4.1.4 Geometry of Inner Product Spaces 


Once a vector space is endowed with an inner product, several important the- 
orems that can be easily interpreted in analogy with Euclidean geometry can 
be applied. The following three theorems characterize the geometric nature of 
inner product spaces (a 4 0 and y £ 0 are assumed; otherwise the theorems 
all become trivial). 


@ Schwarz inequality: 
For any two elements a and y of an inner product space, we have 


I(a, y)] S lee! lly. (4.2) 


The equality holds if and only if a and y are linearly independent. 


Proof From the definition of the inner product, we have 


0< (w+ ay,x2+ ay) = (2,2) +a(a,y)+a*(y,2)+\al?(y,y). (4.3) 


4.1 Hilbert Spaces 77 
Now, set a = —(a,y)/(y, y) and multiply by (y, y) to obtain 
2 
0 < (x, x)(y,y) _ \(x, y)| ) 
which gives Schwarz’s inequality. 


Next, we prove the statement of the equality in (4.2). If a and y are linearly 
dependent, then y = ax for some complex number a so that we have 


I(x, y)| = |(#, ax)| = la|(a, #) = lalla] |!e|| = lla] |lox|| = |laell [yl 
The converse is also true; let # and y be vectors such that |(a, y)| = ||a|| ||y|l, 
or equivalently, 

(x,y)? = (@, y)(y, @) = (@,x)(y,y) = llell? llyll?- (4.4) 


Then we set 


II(y, y)a — (y, x) y|I? 
= |lyll* lel? + (y, @)/? lvl? — yl? 2)(@, ¥) — lly? wy, @)*(y, 2) 
=); (4.5) 


where the postulate (4.4) and the relation (y,«)* = (x,y) were used. The 
result (4.5) means that 


(y,y)x — (x, y)y = 0, 


which clearly shows that x and y are linearly dependent, which completes the 
proof. & 


@ Triangle inequality: 
For any two elements a and y of an inner product space, we have 


Ile + yl] < [lal] + Ilyll- 


Proof Setting a = 1 in (4.3), we have 


Ix + yll? = (x, x) + (y, y) + 2Re(a, y) 
< (a,x) + (y,y) +2 |(z, y)| 
< |lal|? + lly]? + 2ilaliilyll (by Schwarz’s inequality) 
= ((|x|| + Ilyll)?, 


which proves the desired inequality. & 


78 4 Hilbert Spaces 


@ Parallelogram law: 
For any two elements a and y of an inner product space, we have 


lle + yll? + lle — yll? = 2 (lll? + ly?) - 


Proof We have 


lle + yl|? = (aw, x) + (w, y) + (y, x) + (yy) 
= ||a||? + (@, y) + (y,@) + [lyll?. (4.6) 


Now replace y by —y to obtain 


\|a — yl? = |lxl|? — (wy) — (y,@) + Ilyll’. (4.7) 
By adding (4.6) and (4.7), we attain our objective. & 


4.1.5 Orthogonality 


One of the most important consequences of having the inner product is being 
able to define the orthogonality of vectors. The orthogonality allows us to 
establish a set of orthonormal bases that span the inner product space in 
question, thus yielding a useful way to analyze both the nature of the space 
itself and the relation between the constituents involved in that space. 


@ Orthogonality: 
Two vectors a and y in an inner product space are called orthogonal 
if and only if (a, y) = 0. 


Notably, if (a, y) = 0, then (a, y) = (y,x)* = 0 so that (y,x) = 0 as well. 
Thus, the orthogonality is a symmetric relation, although the inner product 
is not symmetric. Note also that the zero vector O is orthogonal to every 
vector in the inner product space. 

A set of n vectors {#1,@2,---@,} is called orthonormal if (a#;, xj) = 6); 
for all i and j, where 6;; is the Kronecker delta. That is, the orthonormality 
of a set of vectors means that each vector is orthogonal to all the others in 
the set and is normalized to unit length. 

It follows that any vector x may be normalized by dividing by its length to 
form the new vector a/||x|| with unit length. An example of an orthonormal 
set of vectors is the set of three unit vectors, {e;} (¢ = 1,2,3), for the three- 
dimensional Cartesian space. 

The following theorem is important in various fields of mathematical 
physics. 


4.1 Hilbert Spaces 79 


@ Theorem: 


An orthonormal set is linearly independent. 


(Proof of the theorem is given in Exercise 1). Importantly, the above theorem 
suggests that any orthonormal set serves as a basis for an inner product space 
of interest (see Sect. 4.2.5). Below is another consequence of the orthonormal 
set of vectors; its proof is given in Exercise 2. 


@ Bessel inequality: 


If {a1,%2,--- ,@,} is a set of orthonormal vectors and x is any vector 
defined in the same inner product space, then 


[iel? = SO Iral?, (4.8) 


where r; = (#,, x). Furthermore, the vector «’ = «—)_,1,#; is orthogonal 
to each 2x;. 


4.1.6 Completeness of Vector Spaces 


Having described features of inner product spaces, we turn now to an- 
other important concept relevant to the nature of Hilbert spaces, i.e., com- 
pleteness. When a vector space is finite dimensional, the completeness 
of an orthonormal set involved in the space may be characterized by the 
fact that it is not contained in any larger orthonormal set. (This is intu- 
itively understood by considering the Cartesian basis e; (i = 1,2,3) ina 
three-dimensional Euclidean space.) When considering an infinite-dimensional 
space, however, the completeness must be determined via the Cauchy cri- 
terion, which we discussed in Sect. 2.2. The following is a preliminary 
definition 


@ Cauchy sequence of vectors: 

A sequence {21,%2,:--} of vectors is called a Cauchy sequence of 
vectors if for any positive « > 0, there exists an appropriate number N 
such that ||, — @,|| <¢ for all m,n > N. 


In plain words, a sequence is a Cauchy sequence if the terms a, and x, in 
the sequence come closer and closer to each other as m,n — oo 


80 4 Hilbert Spaces 


@ Convergence of a sequence of vectors: 
A sequence {21,22,---} is said to be convergent if there exists an ele- 
ment x such that ||a,, — a|| — 0. 


@ Completeness of a vector space: 


If every Cauchy sequence in a space is convergent, we say that the space 
is complete. 


Remark. Here the norm ||a|| = (a, «)!/? associated with an inner product is 


employed to define a Cauchy sequence, since we are focusing on inner product 
spaces. However, the concepts of Cauchy sequence and completeness both 
apply to more general vector spaces in which even a norm is unnecessary (see 
Sect. 4.1.6 for details). 


4.1.7 Several Examples of Hilbert Spaces 


Now we are ready to define Hilbert spaces. 


@ Hilbert space: 


If an inner product space is complete, it is called a Hilbert space. 


Examples 1. Column-vector spaces with n real and complex components, 
denoted by R” and C”, respectively, are finite-dimensional Hilbert spaces 
if endowed with an inner product (a, y) = )>;_, x7y;. Completeness can 
be proved using the Bolzano—Weierstrass theorem (see Appendix A). 

2. Assume an infinite-dimensional vector # = (#1, 2%2,---), where 2; is a real 
or complex number satisfying the condition 


Co 


S > |ail? < 00. 


i=l 


Then, vector spaces spanned by a set of vectors {x}, called ¢? spaces (see 
Sect. 4.3), are Hilbert spaces under the inner product 


Co 
(x,y) = So thy. 
i=l 


Completeness will be proved in Sect. 4.3.1. 


4.1 Hilbert Spaces 81 


4, Assume a set of square-integrable functions f(x) expressed by 


b 
/ |f(x)|?dx < 00. 


Then, the collection of all square-integrable functions, called the L? space, 
is a Hilbert space endowed with the inner product 


b 
(fa) =f say'ale)ae. (4.9) 
Completeness will be proved given in Sect. 4.3.2. 
5. Finally we show an example of an incomplete inner product space. Assume 


the following sequence of real-valued continuous functions { f1(«), fo(x),--- }, 
each of which is defined within the interval [0, 1]: 


aks for O<a<5, 
fn(a) = ¢ 1—2n (a +) for BSaK 5 + 3, (4.10) 
0, for t+i<2<1 


The graphs of f,(a) for n = 1,2,3 are given in Fig. 4.1. After some 
algebra, we obtain 


0 1/2 1 x 


Fig. 4.1. The function f(x) given in (4.10). The sequence { f,(a)} converges to a 
step function in the limit of n — oo 


82 4 Hilbert Spaces 


Thus, {f,} is a Cauchy sequence owing to the inner product given by 
(4.9). However, this sequence converges to the limit function 


which is not continuous and, hence, is not an element of the original inner 
product space. Consequently, the sequence is not complete, and thus is 
not a Hilbert space. 


Exercises 


1. Show that an orthonormal set is linearly independent. 
Solution: Recall that a set of vectors {x%1,%2,---X,} is said to 
be linearly independent if and only if 


Sa 0 => a;=O0 forall 7. 


i=l 


Now suppose that a set {%1,2@2,--: ,,} is orthonormal and sat- 
isfies the relation }>; a;a; = 0. Then, for any j, the orthonormal 
condition (x;,2;) = 6;; results in 


n nm 
0= a;,) a,x; | = y on (x;, 24) Ze a4dij = Oj. 
i=1 i=1 


Therefore, the set is linearly independent. d& 


2. Show the Bessel inequality for x given by 4.8 and the orthogonality of the 
vector w = a — >, r;a; to each a. 


Solution: We consider the inequality 


0 < |x’ ||? = (a’, x’) = | x — So ten, x Se; 
i=1 j= 
n n n 
= (a, 2) — Sori Lj, x )—Sorj( hoe ee Ds f(r e;) 
w=1 g=1 t,g=1 
n 
= |[e|? — SO Ira? - Dr i 
i=1 j= 


n 
= |[e|? — SO ral? 
i=l 


4.2 Hierarchical Structure of Vector Spaces 83 


Thus we have ||a||? > 5°; |ri|?. The second part of the theorem is 
proven by 


(a, 2) — (x, x;) — Sere) = is =f =0. & 


4 


4.2 Hierarchical Structure of Vector Spaces 


4.2.1 Precise Definitions of Vector Spaces 


In this section, we look at the hierarchical structure of abstract vector spaces. 
We will find that the Hilbert spaces that we have considered form a very 
limited, special class of general vector spaces under strict conditions. We begin 
with an exact definition of vector spaces. 


@ Vector spaces: 


A vector space V is a set of elements x (called vectors) that satisfy the 
following sets of axioms: 


1. V is a commutative group under addition: 
(i) e+y=y+a2€V forany x,y € V (closedness). 
(ii) «+ (y+ 2) =(a@+ y) 4+ z (associativity). 
(iii) There exists an addition identity, the zero vector O, for every x € V 
such that «+ 0 = za. 
(iv) There exists an additive inverse —a for every a € V such that 
x + (—x) = 0. 


2. V satisfies the following additional axioms with respect to a number 
field F’, whose elements a are called scalars: 
(i) V is closed under scalar multiplication: 


ax €V _ for arbitrary « € V and aec F. 


(ii) Scalar multiplication is distributive with respect to elements of both 
V and F: 


a(a+y)=anr+ay, (a+ f)x=axr+ Ba. 


(iii) Scalar multiplication is associative: a(Ga) = (ax). 

(iv) Multiplication with the zero scalar 0 € F gives the zero vector such 
agi Oe =O EW, 

(v) The unit scalar 1 € F has the property that la = a. 


In these definitions, F' is either the set of real numbers, R, or the set of 
complex numbers, C’. A vector space over R is called a real vector space. 
If F = C, then V is a complex vector space. 


84 4 Hilbert Spaces 


4.2.2 Metric Space 


Once a vector space is endowed with the concept of a distance between the 
elements, say, « € V and y € V, it is called a metric space. 


@ Metric space: 


Assume a vector space V. A metric space is the pair (V, p) in which the 
function p: V x V => R, called the distance function, is a single-valued, 
nonnegative, real function that satisfies: 


1. p(x, y) = 0 if and only if w = y. 


2. p(x, y) = ply, ). 
3. p(x, y) < p(x, z) + p(z, y) for any z EV. 


Remark. Strictly speaking, the above is called a metric vector space as a subset 
of more general metric spaces. The latter consists of a pair (U,), where U is 
a set of points (not necessarily vectors) and p is a distance function. If U is a 
vector space V, then (V, p) is called a metric vector space. 


Examples 1. If we set 


_fjoifr=y, 


for arbitrary x,y € V, we obtain a metric space. 


2. The set of real numbers R with the distance function p(x,y) = |x — y| 
forms a metric space. 


3. The set of ordered n-tuples of real numbers w = (21, ¥2,--: ,2n) with the 
distance function 
2 1/2 
2 
(x,y) = bs (ri — yi) | 
i=1 
is a metric space. This is in fact the Euclidean n-space, denoted by R”. 
4. Consider again the set of ordered n-tuples of real numbers x = (21, 
X2,°*+ ,n) with an alternative distance function: 


p(x, y) = max ||t; -— yi]; 1<i<n). 


This also serves as a metric space. The validity of Axioms 1-3 mentioned 
above is obvious. 


Comparison between Examples 3 and 4 tells us that the same vector space 
V can be metrized in different ways. These two examples call attention 
to the importance of distinguishing a metric space (V,p) from the vector 
space V. 


4.2 Hierarchical Structure of Vector Spaces 85 
4.2.3 Normed Spaces 


A metric space is said to be normed if for each element « € V there is a 
corresponding nonnegative number ||a||, which is called the norm of a. 


@ Normed space: 


A metric space equipped with a norm is called a normed space. The 
norm is defined as a real-valued function (denoted by || ||) on a vector space 
V, which satisfies 


1. |{Ax|| = |A|||z|| for all Ac F and we X. 
2. |x + yl| < llall + Ilyll- 
3. ||a|| = 0 if and only if a =0. 


Obviously, a normed space is a metric space under the definition of the dis- 
tance p(x, y) = ||a— yl]. 


Examples 1. The space consisting of all n-tuples of real numbers: 2 = 
(@1,22,-*+ ,n) in which the norm is defined by 


in 1/2 
ial = (3-2 
i=1 
is a normed space. 


2. The space above can be normed by a more general form: 


n 1/p 
| = (35-7) (p> 1), 


This norm is referred to as a p-norm of the vector a. 

3. We further obtain an alternative normed space if we set the norm of the 
vector @© = (#1, %2,--: ,&p) equal to the max {|r,|; 1<k <n}. 

4. The collection of all continuous functions defined on the closed interval 
[a, 6] in which 

I| f(x) || = max{|f(x)|: x € [a, o]} 

is a normed space. 

5. The space consisting of all sequences @ = (21, %2,--+ ,%n) of real numbers 
that satisfy the condition lim,-... , = 0 is a normed space if we set 


||a|| = max{|a,|: 1 <n < co}. 


86 4 Hilbert Spaces 
4.2.4 Subspaces of a Normed Space 


A class of normed spaces involves the following two subclasses: one endowed 
with completeness and the other with the inner product. The normed spaces 
of the former class, i.e., a class of complete normed vector spaces, are called 
Banach spaces. 


@ Banach space: 


If a normed space is complete, it is called a Banach space. 
Here, the completeness of a space implies that every Cauchy sequence in the 
space is convergent. Refer to the arguments in Sect. 4.1.6 for details. 


Remark. Every finite-dimensional normed space is a Banach space, since it is 
necessarily complete. 


Examples 1. Suppose that a set of infinite-dimensional vectors x = (#1, £2,°°- 
Ln, ++) satisfies the condition 


Soleil? <oo, (p> 1). 
i=1 


Then, this set is a Banach space, called an @? space, under the p-norm 
defined by 


oe) 1/p 
I|z\|p = (>: et) (4.11) 


i=1 
The proof of its completeness is given in Sect. 4.3.1. 
2. Assume a set of functions f(«) expressed by 


b 
wh | f(x) |Pda < oo. 


Then, this set constitutes a specific class of Banach spaces, called an L? 
spaces, under the p-norm: 


b 1/p 
lll = ( ih ser) | (4.12) 


Completeness is proved in Sect. 4.3.2. 


Now we focus on the counterpart, i.e., a noncompleted normed space endowed 
with an inner product known as a pre-Hilbert space. 


4.2 Hierarchical Structure of Vector Spaces 87 


@ Pre-Hilbert space: 


If a normed space is equipped with an inner product (not necessarily 
complete), then it is called a pre-Hilbert space. 


Finally, we are at a point at which we can appreciate the definition of Hilbert 
spaces. They are defined as the intersection between Banach spaces and pre- 
Hilbert spaces as stated below 


@ Hilbert space: 


A complete pre-Hilbert space, i.e., a complete normed space endowed 
with an inner product is called a Hilbert space. 


Examples The €? spaces and L” spaces with p = 2, known as the ¢? spaces 
and L? spaces, are Hilbert spaces. The inner product of each space, respec- 
tively, is given by 


(x,y) = Dain and (f,g)= | f*(x)g(x)dz. (4.13) 


Remark. Clearly the quantities (a, )1/? and (f, f)!/?, defined through the 
inner products (4.13), are special cases of the p-norm given by (4.11) and 
(4.12), respectively, with p = 2. In fact, for the ¢? and L? spaces, the inner 
products are defined such that 


(x, x) = ||x\|? and (f, f) =|IfI’- 


However, for €? and L” spaces with p 4 2, we cannot introduce inner products 
as 


(x, a) = (|lxllp)” and (f, f) = (Ifllp)” 


because unless p = 2 the p-norm violates the parallelogram law. Accordingly, 
among the family of ¢? and L”, only the spaces ¢? and L? can be Hilbert 
spaces because they have an inner product. 


4.2.5 Basis of a Vector Space: Revisited 


For use in Sect. 4.2.6, we briefly review the definition of a basis in a fintte- 
dimensional vector space and related matters. 


88 4 Hilbert Spaces 


@ Linearly independent vector: 


A finite set of vectors, say, €1,€2,::: ,€n, is linearly independent if 
and only if 
eee a > ¢=0 for alli. (4.14) 
i 
This definition applies to infinite sets of vectors e1,€2,--- if the vector space 


under consideration admits a definition of convergence (see Sect. 4.2.6 for 
details). 


@ Basis of a vector space: 


A basis of the vector space V is a set of linearly independent vectors 
{e;} of V such that every vector x of V can be expressed as 


r= Se Qj€;. (4.15) 
ll 


Here, the numbers a1,Q@2,--- ,@, are coordinates of the vector x with 
respect to the basis, and they are uniquely determined owing to the linear 
independence property. 


Therefore, every set of n linearly independent vectors is a basis in a finite- 
dimensional vector space spanned by n vectors. The number n is called the di- 
mension of the vector space. Obviously, an infinite-dimensional vector space 
does not admit a finite basis, which is why it is called infinite-dimensional. 


4.2.6 Orthogonal Bases in Hilbert Spaces 


For any vector space (finite- or infinite-dimensional), a set of orthogonal vec- 
tors {a,} is called an orthogonal basis if it is complete. Similarly, a com- 
plete orthogonal set of vectors is called an orthonormal basis if the norm 
|z,|| = 1 for all n. It is convenient to use orthonormal bases in studying 
Hilbert spaces, since any vector in the space can be decomposed into a linear 
combination of orthonormal bases. However, when we choose some basis for 
an infinite-dimensional space, some care must be taken to examine its com- 
pleteness property; i.e., an infinite sum of vectors in a vector space may or 
may not be convergent to the identical vector space. 

To examine this point, let us consider an infinite set {e;} (= 1,2,---) of 
orthonormal vectors all belonging to a Hilbert space V. We take any vector 
zx € V and form the set of vectors 


Pe= OD. Oe: (4.16) 
i=l 


4.2 Hierarchical Structure of Vector Spaces 89 
where the complex number c; is the inner product of e; and x expressed by 
qG= (e;, x). 

For the pair of vectors « and x, the Schwarz inequality (4.2) gives 
n 
2 2 
I(x, @n)|” < |lael|? [laenll? = [ell? (>: lei ) (4.17) 
i=l 


On the other hand, taking the inner product of (4.16) with a yields 
(@, Ln) eG (a, €;) = Solel (4.18) 


From (4.17) and (4.18), we have 


n 
dled? < lel’. 
i=l 


This conclusion is true for arbitrarily large n and can be stated as shown 
below. 


@ Bessel inequality: 


Let {e;} (¢ = 1,2,---) be an infinite set of orthonormal vectors in a 
Hilbert space V. Then for any # € V with c; = (e;,x), we have 


Co 


2 
dled? < lall?, 


i 


which is known as the Bessel inequality. 


The Bessel inequality shows that the limiting vector 


i=1 i=1 


has a finite norm, which means that the vector (4.19) is convergent. However, 
we still do not know whether it converges to a. To make such a statement, the 
set {e;} should be equipped with the completeness property defined below. 


@ Complete orthonormal vectors: 


An infinite set of orthonormal vectors {e;} in a Hilbert space V is called 
complete if the only vector in V that is orthogonal to all the e; is the zero 
vector. 


90 4 Hilbert Spaces 


The following is an immediate consequence of the above statement. 


@ Parseval identity: 


Let {e;} be an infinite set of orthonormal vectors in a Hilbert space V. 
Then for any z € V, 


{e;} is oles 


<> |lx\? = Yo laP with c; = (e;,2). (4.20) 


Proof Suppose that the set {e;} is complete and consider the vector defined 
by 


Co 


Y=r— s Cie, 


i=l 


where x € V and c; = (e;, x). It follows that for any e;, 
(65,9) = (e528 )- Soa é7,€;) = c; — ya 65,:= 0. (4.21) 
i=1 


In view of the definition of the completeness of {e;}, (4.21) means that y is 
the zero vector. Hence, we have 


co 
c= e Crej, 
i=1 
which implies 
co 
lll? = >_ lei? 
i=l 


We now consider the converse. Suppose x to be orthogonal to all the {e;}, 
which means 


(e;,a) =c; =0 for alli. (4.22) 


It follows from (4.20) to (4.22) that ||a|/? = 0, which in turn gives x = 
0, because only the zero vector has a zero vector. This completes the 
proof. & 


We close this section by providing precise terminology for the basis of a Hilbert 
space. 


4.3 Hilbert Spaces of €? and L? 91 


@ Basis of a Hilbert space: 


A complete orthonormal set {e;} (i = 1,2,---) in a Hilbert space V is 
called a basis of V. 


Remark. 


1. The concept of completeness of an orthonormal set of vectors is distinct 
from the concept of completeness of the Hilbert space, but they are mu- 
tually related. 

2. In order to define generalized Fourier coefficients c; = (e;,x) for 
x € V (see Sect. 4.3.4), it suffices for the set {e;} to be only orthonormal, 
nor necessarily complete. 


4.3 Hilbert Spaces of £2 and L? 


4.3.1 Completeness of the £2 Spaces 


In this subsection, we examine the completeness property of the space ¢? on 
the field F (here F = R or C). As already noted, the completeness of a given 
vector space V is characterized by the fact that every Cauchy sequence (2,,) 
involved in the space converges to an element 2 € V such that limp... ||x — 
x,,|| = 0. Hence, to prove the completeness of the ¢? space, we show in turn 
that (1) every Cauchy sequence (x,,) in the ¢? space converges to a limit x, 
and (2) the limit x belongs to @?. 
We consider Statement (1). Assume a set of infinite-dimensional vectors 


wherein ol”) € F, and let the sequence of vectors {x , #),.-- } be a Cauchy 


sequence in the sense of the norm 
ae 1/2 
2 
I||| = (>. |xs| < 0. 
i=l 


Then, for any ¢ > 0, there exists an integer N such that 


Ae »\ 1/2 
= (>. pe ag™ <e. (4.23) 
i=l 


mnra>N => 2 —g™ 


92 4 Hilbert Spaces 


This implies that 


a 


af” — a) <e (4.24) 


for every i and every m,n > N. Furthermore, since (4.23) is true in the limit 
m — oo, we find 


|x —2™|| <e (4.25) 


for arbitrary n > N. The inequalities (4.24) and (4.25) mean that a” con- 
verges to the limiting vector expressed by # = (21, X2, ---), in which the 
component x7; € F is defined by 


v= lim «\”, (4.26) 
(That the limit (4.26) belongs to F' is guaranteed by the completeness of F’.) 

The remaining task is to show that the limiting vector x belongs to the 
original space ¢?. By the triangle inequality, we have 


“|| = |lja—-—ax + «x 
eal | (n) (n) 


<|p-2 


“fe 


Hence, for every n > N and for every ¢ > 0, we obtain 


lal] <e+ 2 


As the Cauchy sequence (a), a),---) is bounded, ||a|| cannot be greater 
than . 
e+ limsup [x 


1-00 


and is therefore finite. This implies that the limit vector x belongs to ¢?(F). 
Consequently, we have proven that the space (?(F’) is complete. 


Remark. Among the various kinds of Hilbert spaces, the space ¢? has a sig- 
nificant importance in mathematical physics, mainly because it provides the 
groundwork for the theory of quantum mechanics. In fact, any element x of 
the space (? satisfying the normalized conditions ||a|| = 577°, |zi|? = 1 works 
as a possible state vector of quantum systems. In the Heisenberg formula- 
tion of quantum mechanics, the infinite-dimensional matrices corresponding 
to physical observables act on these state vectors. 


4.3.2 Completeness of the L? Spaces 


We next consider another important class of Hilbert spaces, called L? spaces, 
which are spanned by square-integrable functions { f,(x)} on a closed interval, 
say [a,b]. To prove the completeness of the L? space, we show that every 


4.3 Hilbert Spaces of @? and L? 93 


Cauchy sequence {f,,} in the L? space converges to a limit function f(a), and 
then verify that the f belongs to L?. 

Let {fi (x), fo(x),---} be a Cauchy sequence in L?. Then for any small 
€ > 0, we can find an integer N such that 


b 
mn>N > Ife foll= ff f(t) — fm(o) 2 de <e. 


Then, it is always possible to find an integer n, such that 


n>m > [lfm(2) — fal2)ll <5. 


By mathematical induction, after finding np_1 > ng_2, we find np, > ng 
such that 


1 


nom > Wnle)~ tule < (4) . 


In this way, we obtain a sequence (fp,,) that is a subsequence such that 


1 k 
IfouanC2)~ foal <(G) for k= 1,2,-- 


or equivalently, 


love) oo 1 k 
foul + Dol fanes — Soul <Mall+ > (5) =Mall +1 = A, 


k=1 k=1 


where A is a finite constant. Let 


Gk = [Seria | Ee Fora tag. |e oe gas =f (k= tee 


Then, by the Minkowski inequality, we have 


b b 3 
i, lon(2))? dx = | (Pata ee eee ene at 


k 
t=1 


Let g(x) = lim g(x). Then [g(x)]? = lim[g;,(a)]°, and 


2 
<A? < 00: (4.27) 


b b b 
/ [o(2) Pde = / Jim [gx(2)]? de = lim / lax(e) 2dr. (4.28) 


94 4 Hilbert Spaces 


[See the remark below for the interchangeability of the limit and integral signs 
in (4.28).] It follows from (4.27) and (4.28) that 


[corre € 08, 


or equivalently, 


b oo 2 
/ (Lint +32 one Ful] dx < c. 
& k=1 


This implies that the infinite sum 
ell 3 ee reall (4.29) 
k=1 


converges to a function, denoted by f € L?, in the sense of the norm in L?. 
We next show that the limit function f(a) expressed by (4.29) is an element 
of L? such as 
Il fn(x) — F(a)|] +0 (nm — 00). (4.30) 
We first note that 


f(x) — fn; (®) = 


aggk: 


Fanee ~¢ Fin l@) ; 


J 


> 
Il 


It follows that 


oo 2° 1 ‘ 1 
[f= full S3 Moan ~ fell <2 (5) = ge 
k= 


k=j J 


so we have 
Jim ||P — | =0- 

Observe that 
fn — fll < Win — fra ll + lfn,. — fil, 


where || fn — fn, || > 0 as mn — co and k — oo; thus 


which shows that the Cauchy sequence (f,) converges to f € L?. 


Remark. The interchangeability of limit and integral signs in (4.28) is justified 
by the following three facts: 


(i) The sequence ([gz(a)]7) is a sequence of square-integrable functions in 
a, 2 
(ii) [gx(x)]? > 0 for all k, and 
(iii) The integral if [9x]?dx for each k has a common bound A? as shown in 


(4.27). The proof of this point is based on the theory of the Lebesgue 
integral, which we discuss in Chap. 6. 


4.3 Hilbert Spaces of @? and L? 95 


4.3.3 Mean Convergence 


Before proceeding, comments on a new class of convergence that is relevant 
to the argument on the completeness of the L? space are in place. Observe 
that the expression (4.30) is rephrased in the following sentence: For any small 
€ > 0, it is possible to find N such that 


n>N => (f(x) —fa(x)ll <e. (4.31) 


Hence, we can say that the infinite sequence (f;,) converges to f(a) in the norm 
of the L? space. Convergence of the type (4.31) is called the convergence 
in the mean or the mean convergence, which is inherently different from 
the uniform convergence and the pointwise convergence. The point is the fact 
that in the mean convergence, the quantitative deviation between f,,(x) and 
f(x) is measured not by the difference f(x) — fn(x), but by the norm in the 
L? space based on the integration procedure: 

1/2 


b 
I[f(@) — frl2)Il = / f(x) — fr(a)da|? dex 


Hence, when f(x) is convergent in the mean to f,(a) on the interval {a, 0], 
there may exist a finite number of isolated points such that f(x) 4 fn(x). 
Obviously, this situation is not allowed in cases of uniform or pointwise con- 
vergence. 


4.3.4 Generalized Fourier Coefficients 


Having clarified the completeness property of the two specific Hilbert spaces, 
@ and L?, we introduce two important concepts: generalized Fourier co- 
efficients and generalized Fourier series. We shall see that they play a 
crucial role in revealing the close relationship between the two distinct Hilbert 
spaces ¢? and L?. 

@ Generalized Fourier coefficients: 


Suppose that a set of square-integrable functions {¢;} is orthonormal 
(not necessarily complete) in the norm of the L? space. Then, the numbers 


cr. = (f, ex) (4.32) 


are called the Fourier coefficients of the function f € L? relative to the 
orthonormal set {¢;}, and the series 


XS ChOk (4.33) 
k=l 


is called the Fourier series of f with respect to the set {¢;}. 


96 4 Hilbert Spaces 


Remark. 


1. In general, the Fourier series shown in (4.33) may or may not be con- 
vergent; its convergence property is determined by the features of the 
functions f and the associated orthonormal set of functions {@;}. 

2. Some readers may be familiar with the Fourier series associated with 
trigonometric functions or imaginary exponentials. Notably, however, the 
concepts of Fourier series and Fourier coefficients introduced above are 
more general concepts than those associated with trigonometric series. 


The importance of the Fourier coefficients (4.32) becomes apparent when we 
see that they consist of the ¢? space. In fact, since cz is the inner product of 
f and @x, it yields the Bessel inequality in terms of cz, and f: 


Ye leel? < Ill (4.34) 
k=1 


From the hypothesis of f € L?, the norm ||f|| remains finite. Hence, the 
inequality (4.34) ensures the convergence of the infinite series }77—, |cx|?, 
which consists of the Fourier coefficients defined by (4.32). This convergence 
means that the sequence of Fourier coefficients {c;} is an element of the space 
(?, whichever orthonormal set of functions $%(a) we choose. In this context, 
the two elements f € L? and c = (c1,c2,-::) € @ are connected via the 
Fourier coefficient (4.32). 


4.3.5 Riesz—Fisher Theorem 


Recall that every Fourier coefficient satisfies the Bessel inequality (4.34). 
Hence, in order for a given set of complex numbers (c;) to constitute the 
Fourier coefficients of a function f € L?, it is necessary that the series 


oo 
Da lewl 
k=1 


converge. As a matter of fact, this condition is not only necessary, but also 
sufficient as stated in the theorem below. 


@ Riesz—Fisher theorem: 
Given any set of complex numbers (c;) such that 


Co 


Sinie, (4.35) 


kl 


there exists a function f € L? such that 


4.3 Hilbert Spaces of @? and L? 97 


Ce= (fn) and, Solel = All, (4.36) 


p=1 


where {¢;} is a complete orthonormal set. 


Proof Set linear combinations of ¢,(a) as 


n 


fr(a) = d> crbr(2), (4.37) 


k=1 
where the c, are arbitrary complex numbers satisfying condition (4.35). Then, 
for a given integer p > 1, we obtain 


n+p 


IIfn+p — Fall? = llentiGnti +++ + entpbnepll?= S- lexi? (4.38) 
k=n+1 


Let p = 1 and n > ov. Then, from condition (4.35), we have 


ll frota al Fall = legal? > 0 (n — oo). 


This tells us that the infinite sequence {f,,} defined by (4.37) associated with a 
given set of complex numbers {c;} always converges in the mean to a function 
feL’. 

Our remaining task is to show that this limit function f(a) satisfies con- 
dition (4.36), so we consider the inner product 


G; ¢i) = Chas $i) + (f 7 Fri i); (4.39) 


where we assume n > 1. It follows from (4.37) that the first term on the 
right-hand side is equal to c;. The second term vanishes as n — oo, since 


I(f — fn bi)| Sf — fall - Ii] +0 (rn 00), 


where we used the mean convergence of {f,,} to f. In addition, the left-hand 
side of (4.39) is independent of n. Hence, taking the limit n — oo on both 
sides of (4.39), we obtain 
(f, di) = Gi, (4.40) 

which means that c; is the Fourier coefficient of f relative to ¢;. From our 
assumption, the set {¢;} is complete and orthonormal. Hence, the Fourier 
coefficients (4.40) satisfy the Parseval identity: 

do lee!” = IFIP. (4.41) 

k=1 
The results (4.40) and (4.41) are identical to condition (4.36), thus proving 
the theorem. & 


98 4 Hilbert Spaces 
4.3.6 Isomorphism between £7 and L? 


The Riesz-Fisher theorem results immediately in the isomorphism between 
the Hilbert spaces L? and ¢?. An isomorphism is a one-to-one correspondence 
that preserves the entire algebraic structure. For instance, two vector spaces 
U and V (over the same number field) are isomorphic if there exists a one-to- 
one correspondence between the vectors a; in U and y, in V, say y; = f(a:), 
such that 

f (a1 + A222) = arf (a1) + Q2 f (x2). 


The isomorphism between L? and ¢? is closely related to the theory 
of quantum mechanics, which originally consisted of two distinct theories: 
Heisenberg’s matrix mechanics, based on infinite-dimensional vectors, and 
Schrédinger’s wave mechanics, based on square-integrable functions. From the 
mathematical point of view, the difference between the two theories reduces 
to the fact that the former uses the space ¢?, whereas the latter uses the space 
L?. Hence, the isomorphism between the two spaces verifies the equivalence 
of the two theories describing the nature of quantum mechanics. 

Let us prove the above point. Choose an arbitrary complete orthonor- 
mal set {dn} in L? and assign to each function f € L? the sequence 
(C1, €2,°** ,€n,-++) of its Fourier coefficients with respect to this set. Since 


Co 
do lee!” = IF? < 00, 
k=1 


the sequence (c1,C2,°** ,€n,-*:) is an element of ¢?. Conversely, in view of 
the Riesz—Fisher theorem, for every element (c1,c2,--* ,Cn,--:) of @? there 
is a function f(x) € L? whose Fourier coefficients are ¢1,¢2,--* ,Cn,-*:. This 
correspondence between the elements of L? and ¢? is one-to-one. Furthermore, 
if 


f(x) > (c1, €2,°++ 5 ent +) 
and 
g(a) —> (dy, da,--- »dn,+**), 
then 
f(x) + g(t) —> (er + di,+++ 5en + dny+>*) 
and 


kf (x) =? (key, kco,-- ? Reet) 


which readily follows from the definition of Fourier coefficients (the reader 
should prove it). That is, addition and multiplication by scalars are preserved 
by the correspondence. Furthermore, in view of Parseval’s identity, it follows 
that 


(f.9) = edi. (4.42) 


4.3 Hilbert Spaces of @? and L? 99 


All of these facts ensure the isomorphism between the spaces L? and ?, 
i.e., the one-to-one correspondence between the elements of L? and ¢? that 
preserves the algebraic structures of the space. In this context, we may say 
that every element {c;} in an ¢? space serves as a coordinate system of the L? 
space, and vice versa. 


Exercises 


1. Prove the inequality y lcx|” < || f]| given in (4.34). 
k=1 


Solution: Suppose a partial sum S,(x) = )77_, axbx(x), where 
a, is a certain number (real or complex). Since the set {¢;} is 
orthonormal, 


IIf(2) — Sn(x)I? 


l| 
a. 

| 
4 

& 
aa 
5 
wy 

| 
4 

= 
= 
S 


I 


Ill? — Seri? + 52 (ax —cx)?. (4.48) 
k=1 k=1 


The minimum of (4.43) is assumed if ax = cz. In that case, the 
equation (4.43) reads 


n 


I| F(a )~ Denote IP = IFIP — So leat? 


k=1 


which implies $77_, |cx|? < ||f{|?. Since the right-hand side is 
independent of n, the value of n can be taken arbitrarily large. 
Hence, by taking the limit n — oo, we attain the desired result: 


dinar leal? < IFII?- 
2. Verify the equation (f,g) = bs c;d,; given in (4.42). 


i=1 
Solution: This equality is verified because of the relations 
(ff) = Lier led? and (9,9) = Yj21|di|?, and their conse- 
quences: 
(ftof+9)=(ff)+2(f,9) al 


i=l i=1 i=1 


5 


Orthonormal Polynomials 


Abstract The theory of Hilbert spaces we dealt with in Chap. 4 can be used to 
construct a number of polynomial functions that are orthonormal and complete in 
the sense of the L” space. In this chapter we present three important approaches 
for the construction of orthonormal polynomials, based, respectively, on the Weier- 
strass theorem (Sect. 5.1.1), the Rodrigues formula (Sect. 5.2.1), and generating 
functions (Sect. 5.2.7). We shall find that various orthonormal polynomials relevant 
to mathematical physics can be effectively classified by adopting these methods. 


5.1 Polynomial Approximations 


5.1.1 Weierstrass Theorem 


There are a number of special polynomials that play a significant role in 
various aspects of mathematical physics: Legendre, Laguerre, Hermite, and 
Chebyshev polynomials are well known. For instance, Legendre and Laguerre 
polynomial expansions are often used to solve second-order differential equa- 
tions having spherical symmetry. The point is that many of these special 
polynomials form a complete orthonormal set of polynomials; the ori- 
gin of their orthonormality and completeness can be accounted for in terms of 
the theory of the Hilbert space L?. Owing to completeness, these special poly- 
nomials enable us to produce polynomial approximations of fairly arbitrary 
functions with desired accuracy, which serves as a useful device in manipulat- 
ing square-integrable functions. 

The validity of polynomial approximations is based on the famous 
Weierstrass approximation theorem, which states that from the set of 
powers of a real variable x one can construct a sequence of polynomials that 
converges uniformly to any continuous function within a finite interval [a, }}. 
From this result, we shall see that it is possible to find various kinds of com- 
plete orthonormal sets of polynomials on any interval [a, )J. 

In what follows, for simplicity we focus on polynomial approximations 
only of real-valued functions of a real variable. In the case of a complex-valued 


102 


5 Orthonormal Polynomials 


function, the separate validity of the theorem for each of its real and imaginary 
parts ensures the validity of the theorem. 


a 


Weierstrass approximation theorem: 
If a function f(x) is continuous on the closed interval [a,b], there exists 


a polynomial such as 


Gia) = ye crak (5.1) 
k=0 


that converges uniformly to f(x) on [a, 0]. 


The proof will is be given in Appendix C. Several remarks on this theorem 


are 


given below. 


In the polynomial approximation based on (5.1), the values of coefficients 
oft ) depend on n for fixed m. Thus, in order to improve the accuracy of 
the approximation by going to polynomials of higher degree, the earlier 
coefficients must change. For instance, when the approximating polynomial 
(5.1) is replaced by 

n+1 


Gn4i(2) = ye dyx*, 
k=0 


we have in general 
Cr # dy for all k(< n). 


This situation is in contrast to the case of our familiar Taylor series 
expansions, in which the earlier coefficients remain unchanged. 

The Weierstrass theorem requires only that the continuity of functions be 
approximated. This condition is much weaker than Taylor’s theorem for 
expansion in power series, in which the derivatives of all orders must exists 
(i.e., it must be analytic; see Sect. 7.1.2 for the definition of analytic 
functions). Furthermore, the former theorem can apply to polynomial 
approximations outside the radius of convergence (see Sect. 7.4.1) of a 
Taylor series. 

The Weierstrass theorem may be extended to functions of more than one 
variable. By a straightforward generalization of the proof (see Appendix 
C), it can be shown that if a function f(x1,22,---,@m) is continuous 
in each variable x; located within [a;,b;] (¢ = 1,2,---,m), it may be 
approximated uniformly by the polynomials 


ky ,.k km 
Gn(a1,2,°°° Sti) = S S ses 5 Chey BaesKeog, Been? oe 


k,1=0 k2=0 km 


The special cases of m = 2 and m = 3 are considered in Sects. 5.1.4 and 
5.1.5. 


5.1 Polynomial Approximations 103 
5.1.2 Existence of Complete Orthonormal sets of Polynomials 


It must be emphasized that the Weierstrass theorem requires that the set 
of polynomials {G,,} be neither orthogonal nor complete. Nevertheless, the 
theorem ensures indirectly the existence of a variety of complete orthonor- 
mal sets of polynomials in terms of the L? space. The proof of their exis- 
tence is based on the Gram-Schmidt orthogonalization method shown 
below. 


@ Gram-Schmidt orthogonalization method: 

Given any set of linearly independent functions {y;} normalizable on a 
closed interval, it is possible to construct an orthonormal set of functions 
{Q;} through the recursion formula 


iG = ui(2) i= eee 
Cee =e 


with the definitions: 


ui(x) = pi(x), w(x) = vilx) — )_ (ue, pit1)U(Z). 
el 


Here, (ux, i41) means the inner product in terms of the L? space. Let us 
apply the Gram-Schmidt orthogonalization process to a set of powers {x”} 
that is linearly independent. We then obtain an orthonormal set {Q;} given by 


Qi(e) = So Bn On”. (5.2) 


m=0 


Owing to the orthogonality of the set {Q;}, the original functions z™” are 
expressed conversely by linear combinations of {Q;} such as 


The superscripts (n) and (m) attached to the coefficients a) and b°”, 
respectively, remind us that the values of the terms contained in the finite 
sequences, 


104 5 Orthonormal Polynomials 
10 wae 1 ,ake) } and 126”), of, _— oe) 


depend on n or m: as n (or m) increases, all the earlier terms in the sequence 
must be altered. 

Now let us show the completeness of the orthonormal set {Q,,(a)} given by 
(5.2), which was deduced from the orthogonalization process; this is achieved 
by proving that Parseval’s identity, 


(Ff, Qn)? = IFIP, 


[Sele 


n=1 


holds for any f € L?, or equivalently, by proving that 
(f,Qn) =0 for alln <> |/f|| =0. (5.5) 


The sentence “|| f|| = 0 implies (f,Qn) = 0 for all n” immediately follows 
from the Bessel inequality, 


SNF Qn)I? < AIP. 
k=0 
To prove the converse, we note that if (f,Q,) = 0 for all n, we have 


(f,Gn) = 0 for all n, (5.6) 


since the G, are linear combinations of the @,,. In addition, we recall that 
the Weierstrass theorem guarantees the uniform convergence of the sequence 
(G,) to f. Since uniform convergence implies a mean convergence, we obtain 


If — Gal] — 0. (5.7) 
From (5.6) and (5.7), it follows that 


If — Gall? = (f -— Gn, f — Gn) =IIFIP + [IGall? = 0, 


which implies that ||f|| = 0 as well as ||G,,|/? — 0. (This is because || f||? is 
independent of n and ||G,||? is nonnegative for all n.) As summarized, we 
attain the desired conclusion (5.5), which indicates that the orthonormal set 
{Qn} is complete in terms of the L? space. 

The completeness of the set {Q;} means that there exists a set of constants 
{c;} such that any function g € L? can be approximated in the mean by the 
following sequence of partial sums: 


In(Z) = a8 GQ; (2). (5.8) 


5.1 Polynomial Approximations 105 


The reader should appreciate a crucial difference between (5.1) and (5.8). In 
the latter, the c; are independent of n in contrast to the case of (5.1). Thus as 
we extend the sum to infinity, the approximation improves without changing 
the earlier c;. Therefore, we may say that there exists an infinite series 


im, Gn(a >> c.Q 


that converges to g in the mean. The expansion coefficients c; = (g, Q;) in the 
infinite series are the Fourier coefficients we introduced in Sect. 4.3.4. 


5.1.3 Legendre Polynomials 


The previous discussion revealed that the orthonormal set {Q;} constructed 
from the orthogonalization process based on the set of powers {x} is com- 
plete, so that the linear combination Sen c,;Q; converges in the mean to 
f € L?. Let us employ this result to find an explicit function form of a com- 
plete orthonormal set of functions {P,,} defined on the interval [—1, 1]. The 
first member of such a complete orthogonal set is P9(x) = 1 (For convenience, 
the normalization constant is omitted temporarily). Using the Gram-Schmidt 
orthogonalization process, we have 


a? — (27, Po) Po—(2?,Pi)Pi 1 


PAO) = 1 @ PP, Al 2 


where we use the notation 


(50° — 32), Pa(x) = (3524 — 30x? + 3), 


ie 


P5(x) = = (63° — 70x + 152) ,- 

Eventually, we obtain the complete orthonormal set of polynomials {P,,} 
known as the Legendre polynomial. The x dependence of each function 
is plotted in Fig. 5.1. Note that P,,(a) has exactly n — 1 distinct zeros in the 
open interval [—1, 1]. 


106 5 Orthonormal Polynomials 
A general formula for P,,(x) is given by 
1 [n/2] 


2n — 2k)! 3 
~ Fn d( oa = ky! ern °; ea) 


P, (x) 


where we used the Gauss notation: 


mos, ‘ 
3 if n is even, 


A ~)n-1 


Equation (5.9) is rewritten in a simpler form as 


if n is odd. 


[n/2] é 
_ 1 (=) d” 2n—2k 
Prt) = on » kl (n—k)! da” 


_ 1 @® “=~ _(—1)Fa! 2n—2k 
2" nl da” 2 kl (n — kyl 


ta 
~ In! dx” 
The last line is known as the Rodrigues formula for Legendre polynomials. 


This is a special form of the more general Rodrigues formula that is appli- 
cable to any orthonormal polynomial function. The derivations of (5.9) and 


(x? — 1)”. (5.10) 


Po(x) 


[ar eee ee eee ae La oe! COE a a a 
-1.0 —0.5 0.0 0.5 1.0 


Fig. 5.1. Profiles of the first three terms of the Legendre polynomial P, (x) 


5.1 Polynomial Approximations 107 


(5.10), as well as that of general Rodrigues formula are given in Sects. 5.2.1 
and 5.2.2. 

The orthogonality of the Legendre polynomials follows from the Rodrigues 
formula (5.10). To see this, we denote d”/dx” by d,,, and assume that n > m. 
Dropping constant factors, we have 


I. PiaPoee= i. [dn(a? — 1)"] [da(a? — 1)" de 


= [dn-1(x” —1)"] [dm(x? — 1)™] ey 


-| [dn—1(x? —1)"] [dm4i(a?—1)™| dx, (5.11) 


-1 
where we employed integration by parts. Since 


dn—1 (x? — 1)" = (x? — 1) x (a polynomial), 


the first term in the last line of (5.11) vanishes upon putting in the limits +1, 
leaving the second term alone. Therefore, after n partial integrations, we have 


i Pry(«)Ppr(x)dx = (—1)" L@ —1)"dingn(z? — 1)"dz. 


Now, if n > m, then n +m > 2m so that dp4m(x? — 1)™ = 0. Therefore, 


[pa Pl x)dx =0 form #n. 


If m =n, then we have 


ea. ig : 2 n 2 n 
[. P,(x)?da = 3 (nl)2 ae —1)"dan(a* — 1)"da, (5.12) 
where a normalization constant is explicitly attached. Since (x? — 1)” is a 
polynomial of degree 2n, its (2n)th derivative is just (2n)!. Hence, the integral 
(5.12) reads 


[mW P,,(a)2dx - [ @-we=s. (5.13) 


As summarized, the orthogonal property of Legendre polynomial functions 
is given by 
0 (mn) 


P(x) P,(x)dx = 
B 2 (m =n). 


108 5 Orthonormal Polynomials 


Remark. Equation (5.13) follows from the identity 


1 1 
i (l—2?)"de = uae t"(1-#)"dt =2°"*'B(n+1,n +1) 


“4. 0 
= 92nt1 Lie)? = 92nt1 (n!)? 
I'(2n + 2) (2n4+ 1)! 


Here, we have changed the variable by setting « = 2t — 1 to obtain the beta 
function B(x,y) and the gamma function I(x) defined, respectively, by 


P(x)P'(y) 


B(x, y) = | Pda tats Cran, 


I(a) =} i lla 
0 


5.1.4 Fourier Series 


We next consider the application of the Weierstrass theorem to functions 
of two variables. Through earlier discussions, we have the proof of the 
completeness properties of the set of trigonometric functions sinné and cos né 
(n =0,1,--- ,00). 

The Weierstrass theorem tells us that any function g(x, y) that is continu- 
ous in both variables on finite closed intervals may be approximated uniformly 
by the sequence of functions 


N 
Gia = Yah aty™. (5.14) 
n,m=0 


Employ polar coordinates and restrict the domain of definition to the unit 
circle x = cos@ and y = sin@ to find 


N 
gn(cos0, sin) = fn (0) = ye a) cos” 6 sin™ 0. (5.15) 
n,m=0 


Clearly, fn (@) should be periodic with periodicity 27. Using Euler’s equa- 
tion, 

e’” = cosO + isin6, 
we obtain expressions for the nth powers of sin @ and cos 6: 


cos” 9 = E (e’? + | , sin” @= E (e’? - ae) : 


5.1 Polynomial Approximations 109 


We then rewrite (5.15) in the form 


M of) 
fu()= > (ona with M = 2N. (5.16) 
n=—M 


where we have inserted the factor (27)!/? for later convenience and have 


replaced the variable 6 by x to emphasize the generality of the result. 
The superscript M attached to cM) in (5.16) suggests the possibility that 


the values of cM) are dependent of M. However, this is not the case. In fact, 
the values of the coefficients c, are determined independently of M owing to 
the completeness of the orthonormal set of functions 

eine 


Frl®) = yay =0,+1,:--, 


defined on the interval [—7, 7]. The completeness of the set {F,,} allows us to 
approximate an arbitrary function f in the mean by an infinite series of the 
F,,, and we write 


CO CO 


fae S) eh a= So gine (5.17) 


n=—Co n=—Cco 


where the expansion coefficients are given by 
1 - —1Inxe 
en = (Fn, f) = Omi? J, f(aje™* da. (5.18) 


The series (5.17) with the coefficients (5.18) is known as the trigonomet- 
ric Fourier series. The completeness of the set {F;,} can be verified in a 
discussion similar to that in Sect. 5.1.2. 


5.1.5 Spherical Harmonic Functions 


We have derived the sets of Legendre polynomials and trigonometric functions 
from the Weierstrass approximation theorem in one and two variables, respec- 
tively. We now derive the set of spherical harmonics from a three-variable gen- 
eralization. It tells us that a function g of x, y, z (i-e., 7) can be approximated 
uniformly by a sequence of partial sums given by 


M 
gu(r) = S- a aye™ (5.19) 
j,k,n=0 


110 5 Orthonormal Polynomials 
We may also use an alternative coordinate system such as 


u=a+iy=rsinbe’®, 


v=ax2—-—iy=rsinbe®, 


w=z=rcos8, 


which yields 


gu(r) = 5 oe uaF (5.20) 
a is y=0 
= a ps bs we eo—B)$ sino +P 9 cos’ 8. (5.21) 
!=0 (a,8,7) 


In (5.21), the symbol >7(, 4,7) indicates taking the sums over combinations 
of a, 8,7 subject to the condition a+ 6+~7 =. [Note that the sum over all 
1 in effect removes the restriction on a, 3, and gives the same results as the 
original unrestricted sum in (5.20).] 

We now restrict r to the unit sphere by requiring that |r| = 1, and 
introduce an index m = a — (3. The expression (5.21) is then rewritten in 
the form 


3M 
Pe S- by ae 4 sin? t+ 8—I™l @ cos? O sin!” 8. 
1=0 (a,8,7) 


A trigonometric identity gives 
sin?t+9-!™l 9 cos? 9 = (1 — cos? 6) (°F 8-I™))/2 cos? 8, 


which is a polynomial in cos @ of maximum degree a + 6+ 7—|m| =1-— |ml, 
since a + 3 — |m| is even (see the remark below). Denoting this polynomial 
by fim(cos 0), we get 


3M 
gm (0, ¢) = S- Se pM) img sin!""! @ fim (cos). (5.22) 
1=0 m 
Remark. That a+ 6 —|m| is even is seen by observing the identity 
at B—|m| =m-— |m| +28. 


On the right-hand side, 26 is even and 


ee 0 if m>0, 
™ =) _om if m <0. 


5.1 Polynomial Approximations 111 


The range of the summation over m still has to be specified. Recall that all the 
a, 3,7 are nonnegative integers subject to the condition that a+3+7 = > 0. 
This is illustrated schematically in Fig. 5.2, in which the point (a, 3,7) must 
lie on the oblique face of the tetrahedron depicted in the ay space. The line 
a — =m on the y plane is shown as a solid line. In order for it to intersect 
the oblique face, m must satisfy the condition that 


—L<m<. 


Therefore, the sum over m in (5.22) is restricted to |m| < J, and the last 
equation becomes 


l=0 m=-1 


Here the sequence of functions 
Yim(0, ¢) =e”? sin!™ 6 fim(cos 6), (5.24) 


where fim(cos 6) is a polynomial in cos 0 of degree | — |m|, provides a uniform 
approximation to any continuous function defined on the unit sphere. The 
functions Yj, are called spherical harmonics. Note that for a given 1, there 
are 21+ 1 functions Yim. 

The orthonormality of the set {Yj} is characterized by the relation 


Fig. 5.2. The solid and dashed-dotted lines shown on the y-plane indicate the 
relation a— @ =m for —<m< and m= +4, respectively. In order for the point 
(a, 3, y) be on the oblique face of the tetrahedron, the condition —£ < m < @ should 
be satisfied so that the solid line intersects the line segment AB on the y-plane 


27 wT 
i dé yf sinedd YP AO Vix Old) Paden 
0 0 


112 5 Orthonormal Polynomials 


which determines the functions Y;,, uniquely up to a phase factor. 
General equations for the Yj are 


1—m)!]? 
F a Pi"(cos 6)e*"*?, m>0 
0 


m m dm 
PIM(a) = (1-2)? 2 P(x) 


1 


qitm 
= 5ea' 


1 aa 1)’, m>0, (5.25) 


are called the associated Legendre functions. 


Remark. 


1. The normalization constant of the Y;,, follows immediately from the or- 
thonormality relations for the associated Legendre functions: 


ie P(x) P*(a)de = eee 


There is, of course, a free choice of phase factor; ours is a common choice 
in the physics literature. However, one must be careful because different 
authors choose different phase factors for the spherical harmonics. 

2. We should note that the associated Legendre functions P/"(x) are not 
another orthonormal set of polynomials on [—1,1]. In fact, they are not 
polynomials at all as is clearly seen in equation (5.25). 


Exercises 


1. Find the normalized Legendre polynomials P,,(2). 


Solution: Using equation (5.13), we write the normalized Legendre 
polynomials P,,(x) as 


5 ntl 
P,(2) =4/ P(x), n=0,1,2,---. & 


2. Derive the explicit form of each function: Yoo, Yi1, Yio, and Y;,-1. 
Solution: It follows from (5.24) that for | = m = 0, we obtain 
Yoo = /1/4z. If 1 = 1, then m can equal —1, 0, or +1. Recalling 


5.1 Polynomial Approximations 113 


that fim(cos @) is a polynomial in cos @ of degree 1—|m|, we obtain 
Yio = c1cos0+ co, Yyy = c3e'? sin 0, Yat = cye'? sind. The 
constants ¢1, C2,¢3, C4 are determined by imposing orthonormality. 
For instance, 


2n wT wT 
i ao | sin 0Yo)Yiodé = vif d@ sin O(c, cos 8 + c2) 
0 0 0 


= 601900 = 9, 
27 Tw Tw 
t) ao | sin 8 |Yi0|" d@ = an f dO [sin O(c; cos 0 + c2)|” 
0 0 0 
= d10610 = 1, 


which result in cy = \/3/(4m) and cg = 0. Similarly, it follows 
that cz = —c4 = —1/3/87. We choose the minus sign with the 
convention to be adopted later. Therefore, the first few members 
of the set {Yim} are 


/ 1 /3 , 
Yoo = res Yi => at 8nd, 
3 3 F 
Yio = 4/ — cos, ¥i-1 =4/—e-** sin. & 
An : 8a 


3. From the generating function of Legendre polynomials determine that 


(ii) Pan(0) = (1 SO Poni(0) =0 with (—1)!! = 1, 
2h: 7 * = 2" (n!)? 
(iii) A x” P,(a)dx = Qn+1! 


Solution: We use the equation 
co 
(1—2ta +0?) 17? = N° P,(a)t”. 
n=0 


(i) For « = 1, we have 


Co 


1 n_ . n 
ra = 2, Pall", 


114 5 Orthonormal Polynomials 


which yields P,,(1) = 1. Similarly for « = —1, we obtain 


1 [oe) Co 
ae 2 ) d (-1) 


which gives P,,(—1) = (—1)”. 


(ii) For « = 0, we have 


(ee eye Sah oe m 
(ee) =o gay = Po 


Then, we have the desired result. 


(iii) We get the relation by performing an integration in parts n 
times. & 


4. Show that the Coulomb potential at r = ro experienced from the unit 
charge at z = a on the z-axis is given by 


1 = To\” 
V(ro) = 7S 2. (=) P,,(cos 0), 


where 6 is the angle between the z axis and the vector ro and a satisfies the 
condition rp < a. 
Solution: Using the generating function of Legendre polynomials, 
we have 
a a 1 ae: 1 
~ 4neq|ro— al Ameo \/r2 + a? — 2aro cos 0 


1 a TO n 
= Ty P,, 0). 
Bo (2 Palen 


n= 


V(ro) 


This series converges because 79 < a and |P,(cos@) <1|. & 


5.2 Classification of Orthonormal Functions 


5.2.1 General Rodrigues Formula 


In the previous section we saw that several kinds of orthonormal polynomi- 
als can be produced through the Gram-Schmidt orthogonalization process by 
starting with 1,2, 2?,---. However, there is a more elegant approach that ap- 
plies to most polynomials of interest to physicists. This section describes this 
approach, which is based on the Rodrigues formula and classifies various 
orthogonal polynomials in terms of the parameters involved in the formula. 


5.2 Classification of Orthonormal Functions 115 


@ General Rodrigues formula: 


1 d” 
Ola) = aaa w(x)s 
where it is assumed that 
1. Qi(z) is a first-degree polynomial in «. 
2. s(x) is a polynomial in x of degree no more than 2 with real roots. 
3. w() is real, positive, and integrable in the interval [a,b] and satisfies 
the boundary condition 


w(a)s(a) = w(b)s(b) = 0. 


Equation (5.26) under the three conditions noted above provides the sequence 
of functions (Qo(x), Q(x), Q2(x),---) that forms an orthogonal set of poly- 
nomials on the interval [a,b] with a weight function w(«), which can be nor- 
malized by a suitable choice of constants K,,. For historical reasons, different 
polynomial functions are normalized differently, which is why K’, is introduced 
here. In the meantime, we omit denoting K,, without loss of generality. 


@ Theorem: 


The function Q,(x) defined by (5.26) is a polynomial in x of the nth 
degree and satisfies the orthogonality relation on the interval [a,b] with 
weight w(x): 


b 
i Pm (t)Qn(e)w(e)de =0 (m <n), (5.27) 


where p,,(a) is an arbitrary polynomial of degree m <n. 


Proof From hypothesis, we have 
— |w(x)s”(zx)] =0 (if m<n) (5.28) 
and 


q™ 
dx™ 


n— 


[w(x)s”(x)p<ny(@)] = w(x) 8" (@)D(<k-+m); (5.29) 


where the symbol pc<,)(x) denotes an arbitrary polynomial in x of degree 
<k. Then, integrating (5.27) by parts n times, we obtain for m <n, 


116 5 Orthonormal Polynomials 


b b nm 
[rnb e)Qnte yeaa = [pm (2) 2 fw(e)s"(0)] ae 


a 


b d” 
= w(e)s"(e)pm(a)dx =0, (5.30) 


where we used (5.26) and (5.28). Next we examine whether or not Q, is a 
polynomial of degree n. Set n = m and k = 0 in (5.29) to obtain 


1 d” 
w(a) dx” 


[w(x)s”(a)] = Qn(x) = P(<n)(2), 


which indicates that Q,,(a) is a polynomial of degree no more than n. We thus 
tentatively write 


Qn(x) = p(<n-1)(£) + anz”, (5.31) 


and would like to show that a, #4 0. Multiplying both parts of (5.31) by 
Q,,(x)w(«) followed by integrating on [a,b] yields 


b b b 
/ (Qn(a)|? w(a)de = i Peen-iy(#)Qn(e)w(w)de + ay / 2 Qn (e)w(e)de 


b 
= an | xv" Qn(a)w(a)da, 


where we used (5.30). This clearly proves that a, 4 0, ie., that Q,(z) is a 
polynomial of the nth degree. & 


5.2.2 Classification of the Polynomials 


In what follows, we classify the orthogonal polynomials that are derived from 
the Rodrigues formula (5.26) the three conditions according to noted earlier. 
By the condition 1 associated with (5.26), Q1(z) is a first-degree polynomial, 


and we can define it as r 


Qi(x) = a4 (5.32) 


Then the Rodrigues formula (5.26) reads 


ldw «+(ds/dzx) (5.33) 
wdx s , : 


Recall that s(x) can be the zeroth-, first-, or second-degree polynomial. In 
each case, we can find an appropriate weight function w(z) that satisfies the 
differential equation (5.33) as well as the boundary condition 3: 


w(a)s(a) = w(b)s(b) = 0. (5.34) 


5.2 Classification of Orthonormal Functions 117 


Such discussions determine the explicit forms of possible functions s(x) and 
w(x) under conditions 1-3 in Sect. 5.2.1 and then allow classification of all 
of the orthogonal polynomials provided by the general Rodrigues formula 
described below. 


Hermite polynomials: 


We first consider the case that s(x) is a zeroth-degree polynomial, i.e., a 
constant given by 


s(4%) =a. 
Equation (5.33) takes the form 
ldw 
w dx a 
and has the solution 
ae 
w(x) = Aexp (-5) with a constant A. (5.35) 
The product w(x)s(x) vanishes only at « = too, provided that a > 0. To 


satisfy the conditions in (5.34), we have to set 
a=—-o, b=+0. 


The constants A and a affect only the multiplicative factor in front of each 
polynomial. Thus, without loss of generality, we can take a = 1 and A= 1, 
which yields 
w=e 
The complete orthonormal polynomials corresponding to this case are known 
as Hermite polynomials, designated by H,,(x), and satisfy the orthonormal 
condition 

oe 2 
i, e” Ay, (x) Hy (x)dx = Omn. 


—co 


Laguerre polynomials: 


Next we let s(x) be a polynomial of the first degree, such as 


s(a) = B(a@— a). 
The Rodrigues formula (5.26) now becomes 
1 dw z+ 


wdz  B(x—a)’ 


118 5 Orthonormal Polynomials 
which has the solution 
w(x) = const. x (x — a)”e7*/9, 


where 


ey) 


If 6 > 0 and v > —1, then s(x)w(x) vanishes at « = a and x = +00, and 
w(x) is integrable in the interval [a, +00). The simplest choice is therefore to 
take a = 0 and 6 = 1, which yields 


w=a2’e”, a=0, b= +m. 
These choices result in the Laguerre polynomials, commonly denoted by 


L¥ (x), whose orthonormality relation is given by 


| a’ LY (a) Le (x)dz =omn with vy > -1. 
0 


Jacobi polynomials: 


Finally, let us take 
s(x) = y(e@-a)(B—-2), B>a. 


Here we assume that s(x) has two distinct roots. [If s(a) has a double root, the 
boundary condition (5.34) cannot be satisfied, since in this case the function 
s(x)w(a) cannot vanish at more than one point.] The Rodrigues formula (5.26) 
now reads 

idw  #£+74(6=2)—yo—0) 

wdr (a — a)(B — x) : 


which has the solution 


w(x) = const. x (x4 — a)"(G— 2)”, 


with ap i 

a -¥ a 

i and v= : 
6 y  WB-a) 

If w > —l and v > —1, then s(a)w(2) vanishes at « = a and x = @, and w(z) 

is integrable on the interval [a, 3]. With the replacement 


227-a-—B 
aE SE Nas 


Ba 


apart from multiplicative factors, we obtain 


5.2 Classification of Orthonormal Functions 119 


w=(l—-2)"(14+2)" withy,w>—-1l, a=—-1, b=+41. 


The corresponding complete orthonormal polynomials are called the Jacobi 
polynomials G#>”(a), and satisfy the relation 


1 
i. (l—a)-"(1 — 2) PGE" (2)GRY (x) dz = bmn with v,u > —1. 
-1 


Remark. Jacobi polynomials can be divided into subcategories depending on 
the values of js and v. The most common and widely used in mathematical 
physics are collected in Table 5.1. 


Table 5.1. Special cases of Jacobi polynomials 


Lb v w(x) Polynomial 


Vale RAD i aa Gegenbauer, Cp(x). \ > —1/2 


0 0 1 Legendre, Pn(x) 
—1/2 —1/2 aloer cd Mia Chebyshev of the first kind, T;, (a) 
1/2 1/2 Ga47)" Chebyshev of the second kind, U,(x) 


5.2.38 The Recurrence Formula 
We now show that all the orthogonal polynomials derived from the Rodrigues 
formula (5.26) satisfy the following relation: 


@ Recurrence formula: 


Oa) = (AnX ale bn) Qn(2) wae Cp Dal). (n = 1, 2, si -) (5.36) 


where the constants ay, b,, and c, depend on the class of polynomials 
considered. 


Proof The only property needed for the proof of (5.36) is the orthogonality 
relation: 


b 
| On(#)P <n) (2)w(a)dx = 0, (5.37) 


where the symbol p<, (x) denotes an arbitrary polynomial in x of degree less 
than nm. For convenience, we introduce the following notation: 


&, = coefficient of x” in Q, (2), 
Mn = coefficient of x”~! in Q(x), (5.38) 


b 
ipa | Q? (x) w(x) da. (5.39) 


120 5 Orthonormal Polynomials 


It then follows that 


Qn1a(e) — A 2Qa(a) = SH @.(@) 


because the left-hand side is a polynomial of degree <n; ri”) are appropriate 
constants determined by the left-hand side. Multiplying both sides by wQm, 
taking m equal to 0,1,2,--- ,n —2 successively, and using the orthogonality 


relation (5.37), we obtain 
r™ =0 for m=0,1,2,--- ,n—2. 


Thus 


be fi 2Qn(t) = Q(t) +1 >Qna(z), (6.40 


which is the recurrence formula we are looking for. & 


5.2.4 Coefficients of the Recurrence Formula 


We now have to find the constants r{") and 7? ee in (5.40). In view of the 


orthogonality relation (5.37), we have 


ey Q? (x wae = 6, [ Qn(x)2"w(a)dz. (5.41) 


Multiplying (5.40) by wQ,-1 and Pi we obtain 


ght 2 ie Qn (2) Qn—1(x)xw(0)de 
b 
= -4t : ‘ect  Qnlayina™w(a)de 
En+ Gh= 
ze 3 af ee 
Therefore 
, n Ley En ene 
7) = a 3 = (5.42) 


Substituting this into (5.40) and comparing the coefficients of 2” on both 


sides, yields ‘ ae 
(n) se nm+1 n+l1!'lIn 

rh) = 5.43 

oe aie 

Finally, it follows from (5.40)—(5.43) that the coefficients ap, bn, and cp, defined 


n (5.40) become 


5.2 Classification of Orthonormal Functions 121 


a = En+1 
n En ’ 
En+1 Mn+1 Tn 
bn = ’ 
En Gs En ) 
_ Ley Cn en at 
Cn = Ls . ee (5.44) 


The constants €, and 7, can, in principle, be found from the Rodrigues 
formula once the functions s(a#) and w(x) as well as the constants K,, have 
been fixed. The constants I,, which determine the normalization of the 
polynomials, are given by 


= a | s(x)" w(a)da. 


This follows immediately from the Rodrigues formula if we integrate n times 
by parts the integral 


In 


I 


b b 
ip Qn(ax)? w(x) da Gui) Qn(a)a"w(ax)dx 
En : n d” n 


Although the explicit form of the coefficients given in (5.44) seems rather 
complicated, the corresponding recurrence relation for a specific orthogonal 
polynomial simplifies it considerably. 


5.2.5 Roots of Orthogonal Polynomials 


Consider the recurrence formula (5.36) in which the polynomials Q,(«) are 
normalized, and from (5.39) I, = 1 (n = 0,1,2,---). After some rearrange- 
ment, the equation takes the form 


reece = 1Oq( e $2 Qual) + Bn1Qn-a(t), 
where 
= Mn-1 -_ Mn 
ae Ene En, 


The matrix form is given by 


122 5 Orthonormal Polynomials 


Qo Bo 0/1 O see: 0 Qo 
Qi Coffa Oy. Snfhareos- 0 Q1 
Ph Q2 eS, 0 &/€ Bo -rs-+: 0 Qo 
Qn-1 0 0 OQ. erste Bn-1 Qn-1 
TF ; 


(€n-1/En) Qn 


which gives the eigenvalue equations provided that {x;} are the roots of the 
polynomial equation Q(x) = 0 such that 


where the column vector R(«;) is defined by 
R(x) = [Qo(zi), Q1(2:), Qn-1(zi)] - 


Thus, the eigenvalues of the N x N matrix J are the zeros of Qy(«x). The 
matrix is called the Jacobi matrix associated with the sequence {Q,(z)}. 
Since J is symmetric, the eigenvalues {x;} are real. We thus have proved the 
following theorem: 


@ Theorem: 
The eigenvalues {x;} (¢ = 1,2,---,N) of the matrix J are the 
zeros of Qy(x). The eigenvector belonging to 2; is R(#;) = 


[Qo(x), Qi), ONS (ee 


5.2.6 Differential Equations Satisfied by the Polynomials 


Historically, most orthogonal polynomials were discovered as solutions of 
differential equations. Here we give a single generic differential equation that 
is satisfied by all the polynomials Q,. 


5.2 Classification of Orthonormal Functions 123 


@ Theorem: 


All of the orthogonal polynomials Q,,(a) derived from the general Ro- 
drigues formula (5.26) satisfy the differential equation 


with the constant 


Proof Since dQ,(#)/dx is a polynomial of degree < (n — 1), it follows from 
(5.29) that the function 


2 [seruey 22 | 


is a polynomial of degree <n. Thus, we can write 


—— sce)o(e eS = — Dw ile), (5.45) 


where the \ are undetermined constants. Multiplying both sides of (5.45) 
by wQm and integrating, we get 


: Om(2) sce)u(o 5] dx = —) I. (5.46) 


Here J,, is an integral given by (5.39). Integrating by parts, for m < n the 
left-hand side of (5.46) yields 


if "Qn(a) sce)ute) 22] a 


: dQn AQm 
- -{ s(x)w(a) oe on de 


| ‘w(e) nto) [24 (stepwa) 2) a 


We have used the condition that s(a)w(a) = s(b)w(b) = 0, which is assump- 
tion 3 in Sect. 5.2.1. We also used the fact that Q,(x) is orthogonal to any 
polynomial of degree < n. Consequently, we arrive at the result 


124 5 Orthonormal Polynomials 


M™ = 0, for m<n. 


Setting 
dn) = An; 
for simplicity, we can rewrite (5.45) in the form 
d AQn 
— |s(x)w(a) Qn} _ —w(®)AnQn(2), (5.47) 
dx dx 


which is the differential equation satisfied by a polynomial Q,,(a). The con- 
stant ,, can be found by setting m = n in (5.46) and integrating, as we 
demonstrate later in Exercise 4. 


5.2.7 Generating Functions (I) 


As a matter of fact, all the orthogonal polynomials Q,,(a) discussed thus far 
can be generated from a single function g(t,x) of two variables by repeated 
differentiation with respect to t. Called a generating function, it plays a 
significant role in many areas of mathematics. Here we study the essence of 
generating functions together with several examples by which we can derive 
specific orthogonal polynomials. 

A formal definition of generating functions is given below. 


@ Generating function: 


Assume a (finite or infinite) convergent power series 
y(t) = 2 eee 
k 


The 7(¢) is called a generating function for the sequence of coefficients 


fis fos: +: Aline meee 


Clearly, all the coefficients f, are obtained from differentiating y(t) as given 
by 


_ lat 
Jn ae 

For orthogonal polynomials, generating functions are assumed to take the 
form 


g(t, 2) = S> AnQn(x)t”, (5.48) 
n=0 


where Q,,(x) is an orthogonal polynomial associated with g(t, x), and the A, 
are appropriate constants. The explicit form of g(t, x) can often be derived us- 
ing the Rodrigues formula and Cauchy’s integral formula (see Sect. 7.3.1). 


5.2 Classification of Orthonormal Functions 125 
Remember that the latter formula determines an nth-order derivative of a 
function f(z) as 
d” Hoes n! | f(Q)d¢ 
dam)! ~ On; oc (¢—2)rtP 


where f(z) is analytic within the closed contour C. (See Sect. 7.1.2 for a 
definition of analytic functions.) Applying this to the Rodrigues formula 
for, say, Hermite polynomials H,,(x), we obtain 


72 
a" ox? /2 = (—1)"er"/? nl ¢ e€ ¢ /2d¢ 
da” Qni Jo (C-—a)rtt 


We then try to sum the series as 


H,,(z) = (-1)"e” ? 


y ase) gee | ge ee 
= nl Ori oe pment (Gees eae Ont Jo C— ett 


where we require that the point x—t be inside the contour. Finally we evaluate 
the above integral and find 


etn (t7/2) s Hn (x)t” 


n! 
n=0 


Comparing this last equation with (5.48), we see that e*—(/2) is the gener- 
ating function associated with Hermite polynomials H,,(7). Similarly, we can 
derive the generating function for Laguerre polynomials as 


e-te/(1-t) oo 


(1—t)it+e me Ly (x 


5.2.8 Generating Functions (II) 


There is an alternative way to determine a generating function, which is based 
on the recurrence formula for a particular polynomial. To see this, we try to 
find the generating function of the Legendre polynomials that satisfies the 
following recursion formula: 


(n + 1)Pr4i(x) — (2n + 1)a@P, (x) + nPp_i(x) = 0, 


with Po(x) = 1, Pi(a) = x, and for convenience we set P_;(x) = 0. We seek 
an expression in closed form for 


= Pale 
n=0 


126 5 Orthonormal Polynomials 


First we note that 


74 = ene ia Te 23 So(nt+ 1) Paoi(a)t” 
n=0 
n=0 


By straightforward rearrangement we find that 


Og Og 209 
ai xg(t,x) + dias —itg(t,x) —t BE 


which leads to the partial differential equation 


10g _ a—t 
g Ot 1-—2ta+t?’ 


Coupled with the initial condition g(0,x) = 1 we finally have 


aoe =r, 


Generating functions for other orthogonal polynomials are given in 
Appendix D. 


Exercises 


1. Find the recurrence formula for normalized polynomials Q,,(2). 
Solution: When the polynomials are normalized, we have I, = 1 (n= 
0,1,2,---) from (5.39). The recurrence formula (5.36) is 


An 


Qn-1(z). & 


Qn+1(2) = (Gnx + bn) Qn(2) = 


an—1 


2. Assume that a sequence of orthogonal polynomials satisfies 


Qn41(@) = [(n + Ia + TY Qn() — 3(n + 1I)Qn-1(@). 


Find the normalized constants for Q,(x) defined by Qn(z) = AQn(z), 
where Q,,(x) are normalized polynomials. 


5.2 Classification of Orthonormal Functions 127 


Solution: We denote the normalized polynomials as Q,(x) = 
AQn(z), where the constants A, (n = 0,1,2,---) are to be found. 
Substituting Q,,(a) into the given formula, we have 


Onea(a) = [n+ A) $1] Gale) —3(n-+ 1) Gy (0) 


Comparing this with the normalized recurrence formula from 
Exercise 1, we have the relation (3A,)/An—1 = An—1/(NAn), which 
yields Ay» = An—1/ V3n. This relation gives the normalization 
constants of the form 


An = (n!)/?X5. he 


3n/2 


3. Find the recurrence formula for Hermite and Legendre polynomials. 
Solution: For Hermite and Legendre polynomials, (5.36) reads 


Ayn41(x“) = 2a, (x) — 2nHp-1(2) (5.49) 
and 
(n+ 1)Pr4i() = (2n 4+ 1)aPp(x) — nPp_-1(2), (5.50) 


respectively. See Appendix D for the recurrence relations associ- 
ated with the other polynomials we have discussed. dé 


4. Determine the constants ,, given in (5.47). 
Solution: Setting m = n on the left-hand side of (5.46), we obtain 


[ence Fak owtevae (5.51) 
= [utes [AEB + sceyntnn GE ae 


= | w2rente) Lal) + (2) | ae. (5.52) 


Here we used the relation d(sw)/dx = wk ,Q, [set n = 1 in the 
general Rodrigues formula (5.26).] The orthogonality of Q,(2) 
means that only the nth power of x in the square brackets con- 
tributes to the integral in the last line of (5.52). [See (5.30) for 
details of the orthogonal property of Q,,(x).] We then set up the 
following expressions: 


128 5 Orthonormal Polynomials 


s(t) =ax*+brt+e, 
Qn(2) — Ena + fs ye"? fe ais ; 
Qi(x) =m +0, 


which result in 
AQn . itt 
Qi = KynynEnx” + (const.) x a”! +--- 
xe 


and 

PQn 
“da? 
Thus the relevant terms in the square brackets in the last line in 
(5.52) become 


53 = an(n —1)én2" + (const.) x g?-14---. 


dx Es 3 dg” | bua", 
where we used 7; = dQ;/dzx and a = (1/2)(ds?/dx?), and we get 


/ “Qn(o) s(apue) 22] a 


5.3 Chebyshev Polynomials 


5.3.1 Minimax Property 


Thus far we have seen that every real function f(x) defined in a certain interval 
(finite or infinite) can be approximated in the mean by appropriate orthogonal 
polynomial {Q,,(x)} as 


Fay See) (5.53) 
1=0 


The coefficients c; are determined formally by using the orthogonality of the 
polynomials in question. The striking advantage of such polynomial approxi- 
mations is that an improvement in the approximation through addition of an 
extra term Cn41Qn+1(xz) does not affect the previously obtained coefficients, 
Co,€1,°°* 5Cn- 


5.3 Chebyshev Polynomials 129 


In principle, any polynomial that we discussed in Sect. 5.2 can be approx- 
imated using (5.53). From the point of view of numerical analysis, however, 
the Chebyshev polynomial {T;,(a)} is the best choice, primarily because at 
any point x within the domain [—1, 1], the function T,,(”) has the smallest 
maximum deviation from the true function f(x) to be approximated. This 
property, which is unique to Chebyshev polynomials, is known as the mini- 
max property. In general, polynomials endowed with the minimax property 
are very difficult to find, but fortunately, the Chebyshev polynomials fall into 
this category an, moreover, are easy to compute. 

To show the minimax property of Chebyshev polynomials, we have to be 
aware of two of their other properties. The first is a concise formula for T;,(x) 
that is an alternative to those based on the Rodrigues formula. 


@ Concise formula for Chebyshev polynomials: 


Ti CEN On cosine) Aid == Oya). (5.54) 


The derivation of (5.54) requires some lengthy calculations, so we put it in 
the next subsection (see Sect. 5.3.2). Equation (5.54) implies that each T), (x) 
has n zeros in the interval [—1, 1], which are located at the points 


x = cos F («-5)| (k =1,2,--+,n). (5.55) 


In this same interval, there are n+ 1 extrema (maxima and minima), located 
at 
T 
x = cos (=x) (k =0,1,--- ,n). 
n 


Note that T;,(a) = 1 at all of the maxima, whereas T,,(x) = —1 at all of the 
minima. This feature of T;, is exactly what makes the Chebyshev polynomials 
so useful in polynomial approximation of functions 


Remark. Equation (5.54) combined with trigonometric identities can yield 
explicit expressions for T,,(x): 


To(x) = 1, T(x) =2, To(x) = 2a* —1, T3(x) = 4a* — 3a,--- ) 
and more generally, 
Tn1i(®) = 2aT), (a) —Ty-i(a) (n> 1). 


The last expression is a special case of the general recurrence formula (5.40) 
derived in Sect. 5.2.3. 


The second property of Chebyshev polynomials to be noted is the discrete 
orthogonality relation described below. (The proof is given in Sect. 5.3.3.) 


130 5 Orthonormal Polynomials 


@ Discrete orthogonal relation: 


Ifa, (k = 1,--- ,n) are the m zeros of T,,(x) given by (5.55) and i,j <n, 


then 
he 0, a #- is 
SE Gat aan 2 0, (5.56) 
a Mi, tS7 =, 


From (5.54) and (5.56), we obtain the following theorem: 


@ Theorem: 


Suppose f(x) to be an arbitrary function in the interval [—1,1] and 
define c; (j =1,--: ,n) by 


9 n 
ash: os f(@x)Tj-1 (x), (5.57) 
where xz is the kth zero of T;,,(a) given by (5.55). We then have 


Zs Cc 
f(z) = 2, oTe-a(2) = = for all x = zp. (5.58) 


What is remarkable is the fact that for x = x,, the finite sum in (5.58) is 
equal to f(x) exactly. For « 4 x,, the sum in (5.58) just approximates f(x); 
nevertheless the error can be reduced by increasing the degree n of the sum. 
Moreover, for practical use, we can truncate the sum in (5.58) to a much 
lower degree, for even if we do so, the approximation (5.58) is sufficiently 
accurate over the whole interval [—1, 1], not only at the zeros of T;,(). This is 
in contrast to the case of approximations based on other polynomials, where 
the degree of summation n should be taken as large as possible to obtain 
high accuracy. In fact, this truncation capability is the reason Chebyshev 
polynomial expansion is far better than the other choices. 

To examine the above statement, let us suppose that n is so large that 
(5.58) is virtually a perfect approximation of f(a). We then consider the 
truncated approximation 

m C1 
L)~ cel p_1(2) — = with m<n, 5.59 
f(z) =) Ts a(2) 5 (5.59) 
where the coefficients c, are given in (5.57). The difference between (5.58) 
and (5.59) is given by 


5.3 Chebyshev Polynomials 131 


See Te i@), (5.60) 


k=m+1 


which can be no larger than the sum of the neglected c;’s as the T,,(x)’s are 
all bounded between +1. 

Now we consider the magnitude of the sum (5.60). We know that in general 
the c;,’s decrease rapidly with k, which follows intuitively from the definition 
(5.57). Hence, the magnitude of (5.60) is dominated by the term ¢m41Tm(2), 
which is much less than unity for all « € [—1, 1]. In addition, ¢m+41Tm(a) is an 
oscillatory function with m+ 1 equal extrema distributed almost uniformly 
over the interval [—1, 1]. These two features of the dominant term ¢,41Tim (x) 
result in smooth spreading out of the error of the approximation (5.59). This 
context implies that the Chebyshev approximation (5.59) is very nearly the 
same as the minimax polynomial that has the smallest maximum deviation 
from the true function f(x). 


5.3.2 A Concise Representation 


The aim here is to derive the alternative representation of Chebyshev poly- 
nomials given in (5.54): 


Tn(x) = cos [ncos~*(zx)] . 
We know that Chebyshev polynomials satisfy the relation 


d? d 
£—Ty(z) + nT, (x) = 0, 


Coe aa tae 


which can be rewritten in the form 


“ ( = BET (:)) + —_T,, (x) = 0. (5.61) 


We now apply the following lemma: 


@ Lemma: 


Let p(a) and q(x) be two positive, continuously differentiable functions 
that satisfy the differential equation 


[he Lule] + ale ula) = 0. (5.62) 


If the product p(x)q(a) is nonincreasing (or nondecreasing), then the rela- 
tive maxima of [y(x)]? form a nondecreasing (nonincreasing) set. 


132 5 Orthonormal Polynomials 


(The proof of this lemma is outlined in Exercise 1.) We can see that if 


2 


p(z)= V1—2? and g(a) = ae 


1— 2?’ 


(5.61) corresponds to (5.62), which implies that the product pq is constant. 
Thus, according to the lemma, all relative maxima of T?(2) must assume the 
same value. 

Now we seek a polynomial T,,(x) of degree n that satisfies the condition 


T2(2) =1 whenever T/ (x) = 0. 


That is, T?(x) = 1 at all x where T?(z) has a relative maximum equal to 1. 
Clearly at these points, both T?(2) — 1 and [T’,(x)]* have double zeros. Then 
the function 

Tasd 

[Ti,(a)]° 

is a rational function and all the zeros of the denominator also occur in the 
numerator. Is we compare the degree of the polynomials in the denominator 
and in the numerator, it follows that (5.63) is a quadratic, and without loss 
of generality we have 


(5.63) 


T?(x) —1 
[Ti,(x)}° 
The constant a can be determined by dividing both sides by x* and letting 


x approach infinity. Then, inserting a polynomial of degree n for T;,(a), we 
obtain 


= a(x? — 1). (5.64) 


1 
yp =a 90 that T(x) = cos [ncos”* x +c], 


which yields 


22 eae Ne 
ss (< "| =e (5.65) 


Equation (5.65) is a differential equation for T,(x) that determines the 
explicit form of our desired T,,(x). To solve it, we set 


T(x) = cos@, «x=cos¢, 
where 6 and ¢ are functions of x. We then have 
T2(2) —1 = —sin? 0 


d d dé sin dd 
Bp | = es6 |) eee 
dg in) ( a \¢ sino dé 


and 


5.3 Chebyshev Polynomials 


Substituting these in (5.64) yields 


do\* 
(5) =n? so that =tnd+c, 


and we get 
Tn(x) = cos (ncos~' x +c) . 


To determine c, we note that 


T? (41) =1 =cos(c). 


Hence, c = 0 and we eventually obtain 


Tif) = 008 eos? at). 


5.3.3 Discrete Orthogonality Relation 


133 


(5.66) 


We close this section by proving the discrete orthogonality relation (5.56) for 


Chebyshev polynomials. 


Proof (of the discrete orthogonality relation): Let x, (k 
1,2,--- ,n) be the n zeros of T;,(x), which is given by 


seal es\|aeaeeecal 


Then the value of Tp(a) at x = xz, in which ¢ < n is assumed, reads 


£ 
Tp(xp) = cos [€cos~*(x~)] = cos E (: = 5)| : 
n 
Using the trigonometric identity, we have for 0,m <n, 


Te(k)Tm(ae) 


= 5008 AAS ™) (op | a . cos a (2k | _ (5.67) 


2n 
If €= m= 0, this equals 1 so that we obtain 


equals 1/2 and we have 


YT) = st ae E (2k — 1| 


n He n 


= = (5.69) 


2 Asin(ém/n) 2? 


> Mia) == (5.68) 
k=1 


Otherwise, if 2 = m #4 0, the second term in the last line of (5.67) 


134 5 Orthonormal Polynomials 


where we used the equation (see Exercise 2) 


nm tf 2 
S| cos(2k -lr= mae (for « #0). 
k=1 


2sin x 


In a similar manner, for the case £4 m we find that 
So Tele) Tn (en) = 0: (5.70) 
k=1 


Equations (5.68), (5.69), and (5.70) together are identical to the de- 
sired result given in (5.56). & 


Exercises 
1. Prove the lemma associated with the differential equation (5.62). 


Solution: The proof is based on the nondecreasing property of the 
function defined by 


~ pieyt a Lew o@r 
f(e) = [y(a)? + 


in which the functions y(), p(a), and g(a) are assumed to satisfy 
the differential equation (5.62). The nondecreasing property of 
f(a) is verified by seeing its derivative: 


’ = 1 2py' y+ (4) n2 _ _ (pq)’ 12 
f(z) = 2yy' + - (py) + = (py )” = (pq PY) 
where we used the condition (5.62). From hypothesis, pq is 
nonincreasing, which implies (pq)’ < 0. Hence, it is readily seen 

that f’ > 0, ie., that f is nondecreasing. 

Now we realize that, y/ must vanish wherever y(x)? has a rel- 
ative maximum so that f(a) = y?. Suppose that x; and x are 
two successive zeros of y’, such that 21 < x2. Since f(x) is nonde- 
creasing, we have f(x2) > f (a1), or equivalently, y?(x2) > y?(2x1), 
which means that the relative maxima of y? form a nondecreasing 
set. This completes the proof of the lemma. & 


” sin 2nx 
2. P h 2k —-l1)r= f 0). 
rove that ae k—1)a Scns (for « #0) 


Solution: This equation is obtained by considering the sum 


N N 2i(N+1)x 
i(2k—-1)x@ _ ix Se. 2a ee ( ) 

y € =e y e =e —$§_.— - 

1 — e2tx 

k=1 k=1 


i(N-1)e , SiM(N + 1)o 


—12 
sin x 


=e 


5.4 Applications in Physics and Engineering 135 
Taking the real part of both sides yields 


N E 
S| cos(2k — l)r =cos(N — 1)a- ee 


k=1 


- cos x 
sin x 


_ sn2Na+sin2x 2sinxzcosx sin2Na 


= . & 


2sin x 2sin x 2sin x 


3. Derive the formula for Chebyshev polynomials: 


1-—#? 
ae ess 25> Tn(x)t™, 
ie de Fin(2) 


where |¢| < 1. Then, using this equation, prove that 


27 n 
| cos 6 dé = 2rt 
9 l-—2tcosé+ t? 1-? 


where n > 0. 
Solution: It follows that 


1+ 5° 2 cosm@ = -1+ 2Re 5 > ee = -1+ 2Rel/(1 — te”) 


m=1 m=0 


= (1-—#7)/(1— 2te +27), 


which the desired result. The next equation is found in the Fourier 
cosine series, where the coefficients can be obtained from 


ene ii pram 0d0 = 2t". oe 
n= COs 71 = . 
wJo 1-—2ta+t? 


5.4 Applications in Physics and Engineering 
5.4.1 Quantum-Mechanical State in an Harmonic Potential 


We now consider the application of Hermite polynomials H,,(x) to physical 
systems in the theory of quantum mechanics. We know that H,,(x) satisfies 
the following second-order differential equation: 


Hi (ax) — wH! (x) + ny (x) = 0. 
Let us introduce the related function 


U,,(x) = e-* /*H,, (2). (5.71) 


136 5 Orthonormal Polynomials 


A simple calculation shows that 


2 


1 
UN (a) + (n+ ie =] Un(x) = 0. (5.72) 
This equation is similar in form to the Schr6dinger equation for a quantum 
particle whose motion is confined to an harmonic potential well. In fact, the 
Schrodinger equation is given by 


qe? 


w" (a) + (2 = =) u(x) = 0, (5.73) 


where w(x) is the quantum wave function whose squared value at the position 
x =a, namely, |7)(a)|?, represents the probability density of the quantum 
particle being observed at x = a. The similarity between (5.72) and (5.73) 
implies that the product of the function defined by (5.71), ie., Hp(a), and 
e~*’/4 behaves as a wave function that describes the quantum particle in the 
potential well. 

However, it should be noted that solutions of (5.73) do not always satisfy 
the condition 3 

| W@Par <0, (5.74) 

which must be satisfied for the solutions to be physically meaningful. By 
comparing (5.73) with (5.72), we see that whenever 


E= En =2n+1, (5.75) 


we have 


Wp (x) = cre /2@H, (v20) ' 


which clearly satisfies the condition (5.74) if the constants c, are chosen ap- 
propriately. Furthermore, the uniqueness theorem for solutions of ordinary 
differential equations (see Sect. 15.2.4) guarantees that the values of E’ given 
in (5.75) are the only ones for which (5.73) has solutions satisfying (5.75). 
These specific values of FE are called the eigenenergies of the system, and 
the corresponding solutions psi,,(a) are called eigenfunctions. 


5.4.2 Electrostatic potential generated by a multipole 


Next, we briefly discuss the use of Legendre polynomials in describing 
the electrostatic potential field generated by a multipole. For simplicity, we 
first consider an electric dipole, i.e., a pair of positive and negative charges 
separated by an infinitesimal distance h. We choose our coordinate system 
such that both charges are located on the x-axis with the negative charge at 
the origin. The magnitude of the charges is taken to be +(1/h). Then, the 
electrostatic potential field 2(P) with respect to a point P on the sphere 
x? +y?4+ 22 =r? is represented as 


5.4 Applications in Physics and Engineering 137 


1 1 
@5(P) = lim 
2( ) 1 (SEES ace | 


afi _ & 
~ Ox\r)} 7° 


Therefore, when r = 1, we have 


r=1,.> 
where P,,() is a Legendre polynomial. 

Similar descriptions can be presented for high-degree multipoles. The po- 
tential ©4(P) of a quadrupole is determined as follows: Consider a double 
negative charge —(2/h7) located at the origin and two positive charges 1/h? 
located at the points (x,y,z) = (+h,0,0). Then, the associated potential 
@,(P) at a point on a sphere of radius r is given by 


One) = tin s 2 | u 

(P) = 

nook \ J@rhP+y+e Settyte JV@-hPrer +2 
= ao /1 _ r? — 3x? 
Ox? \r ro? 


so for r= 1, 


G4(P)|,, =—14 32? = Po(x)- 21. 


Similarly, for an octapole, we get 


3 
&(P)| g (5) = —152? + 92 = —P3(z) - 3}, 
r=1 


r=1~ O73 \p 


and in general 


a” 1 


The final result tells us that the potential of a 2”-pole is described by the 
product of the Legendre polynomial P,,(x) and the factor (—1)"-n!. By solving 
the previous equation for P,(x), we obtain the following expression for the 
nth Legendre polynomial: 


P,, (x) = (=1)" ee (+) 


= (-1)"P,(a)- nl. 


n! Ox” 


6 


Lebesgue Integrals 


Abstract The concept of “measure” (Sect. 6.1.2) is important for an understanding 
of the theory of the Lebesgue integral. A measure is a generalization of the concept of 
length that allows us to quantify the length of a set that is composed of, for instance, 
an infinite number of infinitesimal points with a highly discontinuous distribution. 
Thus, the Lebesgue integral is an effective tool for integrating highly discontinuous 
functions that cannot be integrated using conventional Riemann integrals. 


6.1 Measure and Summability 


6.1.1 Riemann Integral Revisited 


It is certain that the Riemann integral is adequate for practical applications 
to most problems in physics and engineering, as the functions that we usually 
encounter are continuous (piecewise, at least) so that they are integrable by 
the Riemann procedure. In advanced subjects in mathematical physics, how- 
ever, we come to a class of highly irregular functions where the concept of an 
ordinary Riemann integral is not applicable. In order to treat such functions, 
we have to employ another, more flexible integral than the Riemann integral. 
In this chapter, we present a concise description of the Lebesgue integral. 
The Lebesgue integral not only overcomes many of the difficulties inherent in 
the use of the Riemann integral, but its study has also generated new concepts 
and techniques that are extremely valuable in practical problems in modern 
physics and engineering. 

At first, the cultivation of an intuitive feeling for the Lebesgue integral 
as an adjunct to formal manipulations and calculations is important, and we 
achieve this by comparing it with the Riemann integral. When defining the 
Riemann integral of a function f(x) on an interval I = [a,b], we divide the 
entire interval [a,b] into small subintervals Ar, = [x,,7%+41] such that 


A= 4% <% << any =). 


140 6 Lebesgue Integrals 


The finite set {2;} of numbers is called a partition P of the interval I. Using 
this notation P, let us define, e.g., the sums 


Sp(f) => Ma(wegi— 2x), sp(f) = >) me(we41 — 2x); 
k=1 


k=1 
where M; and m, are the supremum and infimum of f(a) on the interval 
Any = [Up, X41], respectively, given by 


M,= sup f(x), m,= inf f(a). (6.1) 


re Axr, cEArE 


Evidently, the relation Sp(f) > sp(f) holds if the function f(a) is bounded 
on the interval I = [a,b]. We take the limit inferior (or limit superior) of the 
sums, 

S(f) = liminf Sp, s(f) = lim sup sp, (6.2) 


where all possible choices of the partition P are taken into account. The S(f) 
and s(f) are called the upper and lower Riemann—Darboux integrals of 
f over I, respectively. If the relation holds, i.e., if 


S(f) = s(f) = A, 
the common value A is called the Riemann integral and the function f(x) 
is called Riemann integrable such that 


Aj [ see 


We note without proof that the following conditions ensure the existence of 
the Riemann integral of a function f(x). 
1. f(a) is continuous in I = {a,b}. 
2. f(a) has only a finite number of discontinuities in I = [a, }]. 

On the other hand, when the function f(x) exhibits too many points of 
discontinuity, the above definition is of no use in forming the integral. An 
illustrative example is given below. 


Examples Assume an enumeration {z,,} (n = 1,2,---) of the rational numbers 
between 0 and 1 and let 


se) ={ (a = 21, 22,°+* 5 Zn) 


0 otherwise. 


That is, the function f(a) has the value unity if x is rational and the value 
zero if x is irrational. In any subdivision of the interval Ax, C [0,1], 

Mk = 0, M;, = 1, 
and 

sp = 0, Sp =. 
Therefore, the upper and lower Darboux integrals are 1 and 0, respectively, 
whence f(z) has no Riemann integral. 


6.1 Measure and Summability 141 


6.1.2 Measure 


The shortcoming of the Riemann procedure demonstrated above can be suc- 
cessfully overcome by employing Lebesgue’s procedure. The latter requires a 
systematic way of assigning a measure p(X;) to each subset of points X;. 
In the remainder of this section, we learn about the basic properties of mea- 
sure and its relevant materials, which serve as preliminaries to introduce the 
precise definition of Lebesgue integrals given in Sect. 6.2. 

The measure for a subset of points is a generalization of the concepts of the 
length, area, and volume. Intuitively, it follows that the length of an interval 
[a, b] is b — a. Similarly, if we have two disjoint intervals [a1, bi] and [da2, bal, 
it is natural to interpret the length of the set consisting of these two intervals 
as the sum (b; — a1) + (b2 — a2). However, the ‘length’ of a set of points 
of rational (or irrational) numbers on the line is not obvious. This context 
requires a rigorous mathematical definition of a measure of a point set, as 
shown below. 


@ Measure of a set of points: 
A measure p(X) defined on a set of points X is a function with the 
following two properties: 
1. If the set X is empty or consists of a single point, u(X) = 0; otherwise, 
p(X) > 0. 
2. The measure of the sum of two nonoverlapping sets is equal to the 
sum of the measures of these sets expressed by 


pw Xy SF X2) = p(X1) ae p(X2) for X1 M1 X95 = 0); (6.3) 


In the above statement, X, + X2 denotes the set containing both elements of 
X, and X92, wherein each element is counted only once. If X; and X» overlap, 
(6.3) is replaced by 


w(Xy + Xo) = w(X1) + w(X2) — w(X1 NX) 


so that the points common to X, and X» will be counted only once. 

Various kinds of measures have been thus far introduced in mathematics. 
Among them, is the following important example of measure that plays a 
central role in the subsequent discussions. Consider a monotonic increasing 
function a() and let I be an interval (open or closed) with endpoints a and b. 
We define the a-measure of J denoted by jz. (JZ), which takes different values 
depending on the types of endpoints a and b as shown below. 


@ a-measure of intervals: 
a-measure of intervals are defined by 
© {la ( [a,b] ) = a(b*) —a(a—) for the closed interval [a, 0], 
© fa ((a,b] ) = a(bt)—a(at) for the semiclosed interval (a, }], 


142 6 Lebesgue Integrals 


a(a_) for the semiclosed interval [a, b), 
—a(at) for the open interval (a, b), 


© ba ( (a, 
lim a(a—e) and a(at) = lim, a(a+e). 


By definition, the open interval (a, a) is an empty set, so that q((a,a)) = 0 
for any a € R. The other cases of intervals (a,a] and [a,a) are also empty 
sets. Note that ia (I) > 0 since a(x) is a monotonically increasing function. 


Examples Let a(a) be the monotonically increasing function (see Fig. 6.1) 
e<d, 


z=1, (6.4) 
, «&>t. 


0 
i 
2 
1 


We then have 
Mal [0,1)) = a(17)-a(0") = 0-0 = 0 


and 
Ha( [0,1]) = a(1*)-a(07) = 1-0 =1 
Similarly, 
Mo( [1,2 Ho [1,2)) = 2-0 = 2, 
Ma((1,2)) = Hof (1,2)) = 2-1=1 
269) 
1 o—— 
1/2 e 
0 1 . 


Fig. 6.1. The function a(x) defined in (6.4) 


6.1.3 The Probability Measure 


The significance of measure is understood by illustrating the probability the- 
ory as an example. Probability theory deals with statistical properties of a 


6.1 Measure and Summability 143 


random variable x associated with an event occurring sequentially or simul- 
taneously, where it is assumed that the average of x approaches a constant 
value as the number of observations increases. 

Given a random variable x, its expected (or mean) value is defined by 
the integral 


pay Fa pad (6.5) 


where p(x) > 0 is the probability density function of the random variable 
x defined by 
dP(x) 
p(2) o. dz ’ 
with the probability distribution function P(x). The function P(x) de- 
scribes the probability that the event labeled x occurs. It follows intuitively 
that 


Play <u<ag}= i p(a)dx (6.6) 
and = 
/ p(a)da = 1. 


Examples For a discrete random variable {z;}, the integral of (6.5) can be 
written as a sum: 
E{z}= De LiYi- 


In an experiment with dice, e.g., the probability of each event is given by 


P1 = p2 “eS D6 a 


which yields 


: 7 
E{x;} > So wipi _ 3° 
i=1 


In probability theory, the probability distribution function P(x) plays the 
role of measure. Assume a set of continuous real numbers, X = {x < a} and 
let the function a(a) be the probability that x has a value no greater than a. 
The function a(a) then reads 


a(a) = P(x <a), (6.7) 


where a(—oot) = 0 and a(co~) = 1. Note that a(a) is a monotonically 
increasing function. We have as well 


P{x, <x < 22} = a(x) — a(x), 


144 6 Lebesgue Integrals 


since 
P{x < ao} = Pla <a1}4+ Play <a <a}. 


Therefore, we see that the probability distribution function P(« € I) corre- 
sponds to the a-measure for any interval J, as expressed by 


Ho(I) = P(x eT), 
which behaves as 0 < wa(L) <1 for any J. 


Remark. The mean value (6.5) of a random variable x can be interpreted as a 
Riemann-Stieltjes integral, rather than as an ordinary Riemann integral. 
To see this, we observe that the Riemann integral (6.5) can be expressed by 
the Riemann sum as 


1 xp(x)dx = yt ExP(Ex)(Te41 — Lk), (6.8) 


pices k=—oo 


where €; is any point on Ax,. Since p(x,)(@p11 — Ue) = AP{ay <u < p11} 
from (6.46), the mean value is written in the form 


E{a} = i 7 nape | Mee (6.9) 


—co 


which is called the Riemann-Stieltjes integral of x with respect to p(2). 


6.1.4 Support and Area of a Step Function 


What follows is an important concept that we use together with the concept 
of measure to introduce the definition of the Lebesgue integral. Let [; be any 
interval, and suppose that the step function @(x) given by 


6(2) = cq, «vel; +1=1,2,---,n, 
~ |0, otherwise, 


where a set {c1,C2,--- ,Cn} consists of finite and real numbers. We see that 
@ is constant on each interval J;, and zero elsewhere. We now introduce the 
following concept: 


@ Support of a step function: 
The disjoint set S = Jj UIgU---UTI, € I on which @ is nonzero is called 
the support of 0(z). 


6.1 Measure and Summability 145 


An example of the support 0(x) is depicted in Fig. 6.2. When the support 
of a step function @ has a finite total length, we associate it with the area 
A(@) between the graph of @ and the z-axis, with the usual rule that areas 
below the z-axis have a negative sign. We refer to A(@) as the area under 
the graph of 0. 


O(x) 


Fig. 6.2. The disjoint set S = I; UJ2U--- that serves as the support of 0(x) 
Concepts such as support and area can apply to a linear combination of 
step functions. Suppose that 0),02,--- ,@, are step functions on the same 


interval J, all with supports of finite total length, and that a1, a9,--- ,a, are 
finite real numbers. Then, the function O(x) defined by 


O(a) = Si a70,(x). fora ef 
j= 


is also a step function on J. The support of O(z) has a finite length and the 
area under the graph of O(x) is given by 


A(®) = 5° a;A(9;). 
j=l 
Examples Let 61,62 : (0,3) — R be defined by 


6:(2) = {3 ii A 62(2) = te oe (6.10) 


Let O = 20, = Ao. Then 


3 for [0,1], 
O(#)= <1 for (1,2), (6.11) 
3 for [2,3) 


These are plotted in Fig. 6.3. Clearly O is a step function. Note also that the 
areas are 


A(6,) = 2(1) +1(2)=4, A(@2) = —1(1) + 2(1) =1, 


146 6 Lebesgue Integrals 


and 
A(Q) = 1(3) +1(1) +118) = 7 =2A(6,) — A(62). 


Fig. 6.3. The functions 01(x), 62(x), O(x) given in (6.10) and (6.11), respectively 


6.1.5 a-Summability 


Now, we combine the concepts of a-measure and support of a step function. 
Let a(x) be a monotonically increasing function, I be any interval, and 6(2) 
be a step function. We further assume that the support of @ is a simple set, 
i.e., the union of a finite collection of disjoint intervals. For example, the set 
S = Up_. Je is a simple set if I), [2,--- , In are disjoint intervals. Then, the 
a-measure of S is given by 


n 


Ha(S) = S- Ha(Ix). 


k=1 


Observe that the value of {ia (.S) is independent of the way in which the set S 
is subdivided. Note also that 


(i) pHe(S) > 0 for any simple set S, and 
(ii) if S and T are simple sets such that S C T, then a(S) < ua(T). 
We are now ready to present the following statement: 
@ a-summability: 


A step function 6(x) is a-summable if the support of 6 has a finite 
a-measure with respect to a given monotonically increasing function a(x). 


6.1 Measure and Summability 147 


Given an a-summable step function 6(a), we associate it with a real number 
Aq(0) defined by 


AsO) = 5 chtiol Th) (6.12) 
k=1 


where c, is the amplitude of step function 0(x) for « € I. In general, Aq (A) 
can be thought of as a generalized area. For example, when setting a(x) = 2, 
the measure a(J,) turns out to be just the ordinary length of the interval 
I,, then Ag(@) is just the area A(#) under the graph of 6; as defined in 
Sect. 6.1.4. However, if a(x) has a more complicated function form, we get a 
different value of A,(@) from the above since in that case a length along the 
x-axis should be measured by the a-measure rather than by ordinary length. 
An example of an actual calculation of A,(@) is provided in Exercise 2. 


Remark. We shall see in Sect. 6.2.2 that the Lebesgue integral is defined by 
the limit n — oo of the sum in (6.12). 


6.1.6 Properties of a-summable functions 
We list some basic properties of a-summable step functions without proof. 


e If (x) is a nonnegative a-summable step function with respect to a given 
a(x), then Ag(@) > 0 and A,(0) = 0. 

e If 6, and @2 are a-summable step functions on the same interval J such 
that 0; < A2 on I, then A,(41) < Ag (62). 

e Let aset {6,,} be a-summable step functions on the same interval J, and 
let {am} be finite real numbers. By defining 0: I > Ras 


A(x) = S° 450; (x) 
j=l 
for all « € I (6 is also an a-summable step function on I), we have 


A,(0) = > ayAa(4))- 


Exercises 


1. Assume a monotonically increasing function a(a) defined by 


( 
(—oo, 1), 
[ 


0, rE 
x?—22+2, 2x € (1,2), 
Bo 3 r=2 
x+2, x € (2,00). 


148 6 Lebesgue Integrals 


Calculate A,(@) for each of the two step functions: 


and 


we have 
Aa (@1) = (—1)0 + 2(5) = 10. 


For 62, on the other hand, we have a different result since 


ya (0,1}) = a+) -a(0-) =1-0=1, 
pa( (1,3) ) = a(3*) — a(1+) =5-1=4, 


which yields 


Ao(62) = (—1)1 + 2(4) = 7. 


It is noteworthy that the values of A.(61) and A, (2) are different, 
although the area A(@) for them is the same. The difference comes 
from the fact that a has a discontinuity at the single point where 
6, and 65 have different values. & 


2. Evaluate A,(0) of the step function: 


which is associated with the a-measure: 


0, w2<0, 
a(r)=< 5, 2=0, 
1, a«>0. 


Solution: Since 


Ha((—00,0]) = a(0*) — a(-o0t) = 5-0 = 5, 
pal(0,00)) = a(co-) — a(0*) = 1-5 = 5, 


6.2 Lebesgue Integral 149 


we have 


3. Show that the function 


f(z) = lim lim (cos2rmlz)”, 


noo mM—->0oO 


called Dirichlet’s function, takes the form 
oe 1 for all rational numbers zx, 
~ 10 otherwise. 


Solution: When z is a rational number, it is expressed by a 
fraction p/q with relatively prime integers p and q. Hence, for 
sufficiently large m, the product m!a becomes an integer since 


mia =m-(m—1)---(¢q+1)-p-(q—1)--- 2-1. 


Thus we have cos 27m!a = 1. Otherwise, if x is an irrational num- 
ber, m!a is also an irrational for any m, so that |cos2amla| < 1. 
As a result, we obtain 


1: @ is a rational, 
0: x is an irrational. 


noo M—-+O0oO 


lim lim (cos 2rml!ax)” -{ 


6.2 Lebesgue Integral 


6.2.1 Lebesgue Measure 


The Lebesgue integral procedure essentially reduces to finding a measure for 
sets of arguments. In particular if a set consists of too many points of discon- 
tinuity, we need a way to define its measure that is known as the Lebesgue 
measure. It this subsection, we explain how to construct the Lebesgue mea- 
sure of a point set. 

As a simple example, let us consider a finite interval [a,b] of length L. 
This can be decomposed into two sets: a set X consisting of some of the 
points x € [a,b] and its complementary set X’ consisting of all points 
x € [a,b] that do not belong to X. A schematic view of X and X’ is shown 
in Fig. 6.4. Both X and X’ may be sets of several continuous line segments 
or sets of isolated points. 

We would like to evaluate the measure of X. To do this, we cover the set 
of points X by nonoverlapping intervals A; C [a,b] such as 


XC (A, + AQ4+-:-). 


150 6 Lebesgue Integrals 


y 


aa 
x’ 


Fig. 6.4. A set X and its complementary set X’ 


If we denote the length of A, by ¢,, the sum of ¢, must satisfy the 
inequality 


O57 SE, 
k 


In particular, the smallest value of the sum }°, ¢; is referred to as the outer 
measure of X and is denoted by 


Hout(X) = inf (= 4) : 
k 


In the same manner, we can find intervals A;,’ C [a,b] of lengths 4, 0,--- 
that cover the complementary set X’ such that 


X'c (At Aht---), OS SOA K<L. 
k 
Here we define another kind of measure denoted by 


Min(X) = DL — pout(X') = L — inf (>: “) ) (6.13) 
k 


which is called the inner measure of X. Note that the inner measure of X is 
defined by the outer measure of X’, not of X. It is a straightforward matter 
to prove the inequality 


0< pin(X) < Hout(X). (6.14) 
Specifically, if 
Hin(X) = Pew(*)s 


it is called the Lebesgue measure of the point set X, denoted by p(X). 
Clearly, when X contains all the points of [a,b], the smallest interval that 
covers [a, }] is [a, b] itself, and thus w(X) = L. 


6.2 Lebesgue Integral 151 


Our results are summarized below. 


@ Lebesgue measure: 
A set of points X is said to be measurable with the Lebesgue mea- 
sure ,4(X) if and only if pin(X) = Mout(X) = u(X). 


Remark. An unbounded point set X is measurable if and only if (—c,c) NX is 
measurable for all c > 0. In this case, we define u(X) = lime u[(—c,c) NX], 
which may or may not be finite. 


6.2.2 Definition of the Lebesgue Integral 


We are now in a position to define the Lebesgue integral. Let the function 
f(x) be defined on a set X that is bounded: 


0 < ani < f(x) < Friteced 


We partition the ordinate axis by the sequence {fx} (1 < k < n) so that 
fi = fmin and fy = fmax. Owing to the one-to-one correspondence between x 
and f(a), there should exist sets X; of values x such that 


fe < f(x) < fyi for x € Xz (1 <k<n-1), (6.15) 


as well as a set X,, of values x such that f(a) = f,. Each set X, assumes a 
measure ju(X;). Thus we form the sum of products fx + 4(X;,) of all possible 
values of f, called the Lebesgue sum: 


So fis W(X). (6.16) 
k=l 


If the sum (6.16) converges to a finite value when taking the limit n — oo 
such that 
max | fz — fr+i| > 0, 
then the limiting value of the sum is called the Lebesgue integral of f(x) 
over the set X. 
The formal definition of the Lebesgue integral is given below. 


@ Lebesgue integral: 
Let f(a) be a nonnegative function defined on a measurable set X 
and divide X into a finite number of subsets such as 


X =X +Xo+---+ Xp. (6.17) 


152 6 Lebesgue Integrals 


Let f, =infrex, f(x) to form the sum 
ae fru(Xe)- (6.18) 
k=1 


Then the Lebesgue integral of f(x) on X is defined by 


du = li XG) || 
ie aah es ae enter pa | 


where all possible choices of partition (6.17) are considered. 


Figure 6.5 is a schematic illustration of the Lebesgue procedure. Obviously, 
the value of the Lebesgue sum (6.16) depends on our choice of partition. 
If we take an alternative partition instead of (6.17), the value of the sum 
also changes. Among the infinite variety of choices, the partition that max- 
imizes the sum (6.17) gives the Lebesgue integral of f(x). That a function 
is Lebesgue integrable means that the limit superior of the sum (6.18) is 
determined independently of our choice of the partition of the x-axis. 


Sa = Snax 
A 
h 
fi = fmnin 
L 
oe oe e—o 
——_ e—e 


Fig. 6.5. An illustration of the Lebesgue procedure 


6.2.3 Riemann Integrals vs. Lebesgue Integrals 


Before proceeding further with this discussion, we compare the definitions of 
Riemann and Lebesgue integrals for a better understanding of the significance 
of the latter. In the language of measure, the Riemann integral of a function 


6.2 Lebesgue Integral 153 


f(x) defined on the set X is obtained by dividing X into nonoverlapping 
subsets X; as 


X =X, +Xot---+ Xn, XiNX; =0, for any 7,7, 


followed by setting the Riemann sum 


S> f (Ee) M(x). (6.19) 
k=1 


Here, the measure (Xx) is identified with the length of the subset X;,, and 
€, assumes any point that belongs to X;. We increase the number of subsets 
n — co such that 

p(X) > 0 for any Xx, 


and if the limit of the sum (6.19) exists and is independent of the subdivision 
process, it is called the Riemann integral of f(x) over X. Obviously, the 
Riemann integral can be defined under the condition that all values of f(z) 
defined over X;, tend to a common limit as u(X;,) — 0. Such a requirement 
excludes any possibility of defining the Riemann integral for functions having 
too many points of discontinuity. 


Remark. In view of the analogy between the sum (6.12) and (6.18), we may 
say that, in a sense, the Lebesgue integral is the limit n — oo of the quantity 
Aa (8). 


Although the Lebesgue sum given in (6.16) is apparently similar to the 
Riemann sum given in (6.19), they are intrinsically different. In the Riemann 
sum (6.19), f(&;) is the value of f(x) at an arbitrary point €; € X;. Thus the 
value of €; is allowed to vary within each subset, which causes an indefiniteness 
in the value of f(&;) within each subset. On the other hand, in the Lebesgue 
sum (6.16), the value of f; corresponding to each subset X; has a definite 
value. Therefore, for the existence of the Lebesgue integral, we no longer need 
local smoothness of f(x). As a result, the conditions imposed on the inte- 
grated function become very weak compared with the case of the Riemann 
integral. 


6.2.4 Properties of the Lebesgue Integrals 


Several properties of the Lebesgue integral are given below without proof. 


1. If f(x) is the Lebesgue integrable on X and if X = X, + X2+---+Xn, 


then e 
fotan= 3 fl tan 


154 6 Lebesgue Integrals 


2. If two functions f(a) and g(x) are both Lebesgue integrable on X and if 
f(a) < g(a) for any « € X, then 


- fay < I ody. 


3. If u(X) =0, then fy f(a)dx = 0. 
4. If the integral f,, f(x)dz is finite, then the subset of X defined by 


X' = {e | f(w) = +00} 


has zero measure. This means that in order for the integral to converge, 
the measure of a set of points x at which f(x) diverges is necessarily zero. 
5. Suppose that [, f(«)dz is finite and that X’ C X. If we make p(X’) > 0, 


then 
i fdu — 0. 
x’ 


6. When f(x) on X takes both positive and negative values, its Lebesgue 
integral is defined by 


[sam fo rtant f fray (6.20) 


and 
= +du — ° 
where f(a) for {a; f(x) > 0} 
x or 42; vt) A ’ 
f*(2) = ie for {x; f(x) < 0}, 
and 


f-(2) = 0 for {x; f(a) > O}, 
~ | f(a) for {a; f(a) < O}. 
Definition (6.21) is justified except when both integrals on the right-hand 
side diverge. 


6.2.5 Null-Measure Property of Countable Sets 


Let us show that any countable set has a Lebesgue measure equal to zero. A 
rigorous definition of countable sets is given herewith. 


@ Countable set: 

A finite or infinite set X is countable (or enumerable) if and only if 
it is possible to establish a reciprocal one-to-one correspondence between 
its elements and the elements of a set of real integers. 


6.2 Lebesgue Integral 155 


It follows that every finite set is countable and that every subset of a countable 
set is also countable. Any countable set is associated with a specific number, 
called the cardinal number, defined below. 


@ Cardinal numbers: 

Two sets X; and X29 are said to have the same cardinal number if and 
only if there exists a reciprocal one-to-one correspondence between their 
respective elements. 


Remark. A set X is called an infinite set if it has the same cardinal number 
as one of its subsets; otherwise, X is called a finite set. 


It should be stressed that an infinite set may or may not be countable. When 
a given infinite set is countable, then its cardinal number is denoted by No, 
which is the same as the cardinal number of the set of the positive real integers. 
Furthermore, the cardinal number of every noncountable set is denoted by &, 
which is identified with the cardinal number of the set of all real numbers (or 
the set of points on a continuous line). Cardinal numbers of infinite sets, No 
and N, are called transfinite numbers. 

The most important property of countable sets in terms of measure theory 
is given below. 


@ Theorem: 
Any countable set (finite or infinite) has a Lebesgue measure of zero, 
namely, null measure. 


Examples An illustrative example is the set of rational numbers that has 
measure zero as shown earlier. The countability of this set follows from the 
fact that it can be arranged in a sequence of proper fractions as 

d,, To 22) 1 82 OF 28h A 

Pn aes a cee ss 

Accordingly, since the set of all rational numbers in the interval [0, 1] has zero 
measure, the Lebesgue integral of Dirichlet’s function .(x) over this interval 
is well defined and equal to zero. 


0, I, 


Another well-known example of the set of measure zero is the Cantor set, 
which is demonstrated in Exercise 2. 
6.2.6 The Concept of Almost Everywhere 


We have observed that sets of measure zero make no contribution to Lebesgue 
integrals. This fact provides a concept of an equality almost everywhere 


156 6 Lebesgue Integrals 


for measurable functions, which plays an important role in developing the 
theory of function analysis. 


@ Equality almost everywhere: 
Two functions f(a) and g(x) defined on the same set X are said to be 
equal almost everywhere with respect to a measure p(X) if 


pfx eX; f(x) # g(x)} =0. 


We extend this terminology to other circumstances as well. In general, a prop- 
erty is said to hold almost everywhere on X if it holds at all points of X 
except on a set of measure zero. Thus two functions f(a) and g(x) are said to 
be equivalent (written f ~ g) if they coincide almost everywhere. For exam- 
ple, Dirichlet’s function mentioned earlier is equivalent almost everywhere to 
the function g(x) = 0. 

Since the behavior of functions on sets of measure zero is often unimpor- 
tant, it is natural to introduce the following generalization of the ordinary 
notion of the convergence of a sequence of functions: 


@ Convergence almost everywhere: 
A sequence of functions { f(a) } defined on a set X is said to converge 
almost everywhere to a function f(«) if 


Jim, fae) = F(x) (6.22) 


for all « € X except for points of measure zero. 


Examples A typical example is the sequence 


{fn(a)} = {(—2)"} 


defined on [0, 1]. It converges almost everywhere to the function f(a) = 0; in 
fact it converges everywhere except at the point x = 1. 


Exercises 


1. Show that the set of all rational numbers in the interval [0,1] has a 
Lebesgue measure equal to zero. 


Solution: Denote by X’ the set of irrational numbers that is com- 
plementary to X and the entire interval [0,1] by J. Since u(Z) = 1, 
the outer measure of X’ reads 


Hout (X’) = Hout (I — X) = Hout (I) — Hout (X) = 1 Peas Xo: 


6.2 Lebesgue Integral 157 
By definition, the inner measure of X is given by 
Hin( X ) a Hin(L) am Hout (X’) = 1- (1 aa Hout (X )] = Mout (X). 


The last equality asserts that the set X is Lebesgue measur- 
able). The remaining task is to evaluate the value of u(X) = 0. 


Let x, (k = 1,2,--- ,n,---) denote the points of rational num- 
bers in the interval I. We cover each point 41, ©2, --- , fn, ++: by 
an open interval of length ¢/2, ¢/27, --- , ¢/2”,---, respectively, 


where € is an arbitrary positive number. Since these intervals may 
overlap, the entire set can be covered by an open set of measure 
not greater than 


La" 20-H) 


Since € can be made arbitrarily small, we find that pour(X) = 0. 
Hence, from (6.18) we immediately have u(X)=0. & 

2. Evaluate the measure of a Cantor set, an infinite set constructed as fol- 
lows: (i) From the closed interval [0,1], delete the open interval (1/3, 2/3) 
that forms its middle third; (ii) from each of the remaining intervals 
(0,1/3] and [2/3,1] delete the middle third; (iii) continue this process 
of deleting the middle thirds indefinitely to obtain the point set on the 
line that remains after all these open intervals. 


Solution: Observe that at the kth step, we have thrown out 2”! 


adjacent intervals of length 1/3*. Thus the sum of the lengths of 
the intervals removed is equal to 


This is just the measure of the open set P’ that is the comple- 
ment of P. Therefore, the Cantor set P itself has null measure 


u(P) =1-p(P'))=1-1=0. @ 
3. Show that if f(a) is nonnegative and integrable on X, then 
ieee SOS i. fap, 
which is known as, Chebyshev’s inequality. 


Solution: Set X’ = {x € X, f(c) > c} to observe that 


ftaw= fo tans fo san fo tan > cnx’, & 


158 6 Lebesgue Integrals 


4. Show that if J, |f|du =0, then f(a) = 0 almost everywhere. 


Solution: By Chebyshev’s inequality, 


pre xir@le =| <n \fldu=0 
n x 


for alln = 1,2,---. Therefore, we have 


ula eX, f(x <ye [ee x.Isa]2 + | =0. de 


6.3 Important Theorems for Lebesgue Integrals 


6.3.1 Monotone Convergence Theorem 


Our current task is to examine whether or not the equality 


lim "ea yar = fo f(a (6.23) 


n—Co 


is valid under the Lebesgue ren This problem can be clarified by 
referring to two important theorems concerning the convergence property of 
Lebesgue integrals; the monotone convergence theorem and the dom- 
inated convergence theorem. Neither theorem is valid if we restrict our 
attention to Riemann integrable functions. We observe that, owing to the 
two convergence theorems, Lebesgue theory offers a considerable improvement 
over Riemann theory with regard to convergence properties. 

In what follows, we assume that X is a set of real numbers, and that { f,} 
is a sequence of functions defined on X. 


@ Monotone convergence theorem: 
If (fn) is a sequence such that 0 < fp < fn4i for all n > 1 in X and 
i = limn—oo Fs then 


lim ee i; lim frau = | fdp. 
era OO exe 


n—Cco 


Remark. The monotone convergence theorem states that in the case of 
Lebesgue integrals, the conditions to reverse the order of limit and integration 
are much weaker than in the case of Riemann integrals; i.e., only the point- 
wise convergence of f,,(z) to f(x) is required in the Lebesgue case, whereas 
in the Riemann case we must have uniform convergence of f,(x) to f(z). 


6.3 Important Theorems for Lebesgue Integrals 


Proof (of the monotone convergence theorem): The hypothesis 
O< fn < fn+1 implies that 


0<f fades f forrdy 
xX xX 


which indicates that the sequence {J f,dy} increases monotonically 
with respect to n; thus its limit n — oo exists as a we denote it by M 
(possibly equal to oo). In addition, by hypothesis 


[tatu s | fa for all n. (6.24) 
x x 


Since (6.24) is true for arbitrary n, we have 


M= lim fe fu < ff san. 


Therefore, if we can verify the opposite inequality 


M> fi fay (6.25) 
x 
we will get the desired result, 
M= lim f fndu = | lim frdp = | fdp. 
To show (6.25), let c be a number such that c € (0, 1) and introduce 
the point set 
Xn = {2 cf (2) < fala}. 


Owing to the monotonically increasing property of the sequence 
{fn(x)} with regard to n, the set X,, satisfies the inclusion relation 


X,C Xo C X3C-:: and ees 


n=1 


In addition, the increasing property of the sequence { [ Xx, find} yields 
cf fdu < | fndy < lim ff fac = WM. (6.26) 
Xn Xn Bet OX, 
Since (6.26) must hold for any n, we have 
ef fd < M. (6.27) 
x 
Furthermore, since (6.27) is true for all c € (0,1), we have 
| fdu <M. (6.28) 
x 


Note that the substitution c = 1 into (6.27) is allowed because the 
symbol <, not <, is involved in (6.27). & 


159 


160 6 Lebesgue Integrals 


6.3.2 Dominated Convergence Theorem (I) 


In the previous argument, we saw that the order of limit and integration can 
be reversed when considering monotonically increasing sequences of functions. 
In practice, however, the requirement in the monotone convergence theorem, 
ie., that the sequence {f,,(2)} must be monotone increasing, is sometimes 
very inconvenient. In this subsection, we examine the same issue for more 
general sequences of functions, i.e., nonmonotone sequences satisfying some 
looser conditions and their limit passage. Our current objective is to prove 
the theorem below. 


@ Dominated convergence theorem: 

Let {fn} be a sequence of functions for almost everywhere on X such 
that (a) limp—oo fn(z) = f(x), and (b) there exists a nonnegative g such 
that |f,| < g for all n > 1. Then, we have 


lim frau = | fdp. 


Remark. Note that the condition imposed on the theorem above is that the se- 
quences { f,,} should be bounded almost everywhere. This condition is clearly 
looser than that imposed in the monotone convergence theorem. Hence, the 
monotone convergence theorem can be regarded as a special case of the dom- 
inated convergence theorem. 


6.3.3 Fatou Lemma 


The proof of the dominated convergence theorem requires the lemma given 
below. 


@ Fatou lemma: 
If fn(a) = 0 for all n and for almost everywhere in a bounded measurable 
set X and if limp. fn(x) = f(x), then 


ih [lim inf | du = i fd < lim inf | i fu 
where the definition is 


liminf f, = lim in fe ‘ 


n— oo n— co 


6.3 Important Theorems for Lebesgue Integrals 161 


Proof Let gn = infg>n fr. Since the sequence g,,(x) is nonnegative and non- 
decreasing, we have 


lim gn = liminf fy. 
n—-co n—-oo 


(See Sect. 2.1.4 for the precise definition of lim inf.) In addition, the monotone 
convergence theorem implies that 


lim mau = | lim ondu= f lim inf f,du. (6.29) 
XxX x noo Xe n—Co 


n—oCo 


It also follows that 
Gn(a) < f(a) for any k>n. 
Hence, 
i ondu< | frdp for any k>n, 
x xX 
that is, 
<i ; 
[sae < iat fe fide 


Taking the limit n — oo and applying the monotone convergence theorem, 
we get 


n—Co 


lim Gnd < lim it f fade =limint [ Frau. (6.30) 
x noo |k>n x n—0o xX 


From (6.29) and (6.30), we conclude that 
: liming fudu = f faye <timint [ frdu. & 


6.3.4 Dominated Convergence Theorem (IT) 
Our next task is to prove the dominated convergence theorem. 


Proof Observe that f,, and f are Lebesgue integrable on X. From hypothesis, 
it follows that f, +g > 0 and g— f, > 0 almost everywhere. Thus by Fatou’s 
lemma, we have 


| liminf (fn +g) du < lim nt [ (fn +g) du 
x m>0°0 NCO xX 
or 


n—- co n—- co 


| lim inf frdu < timint [ find (6.31) 
x x 


by the linearity of the Lebesgue integral. It is also true that g — f, > 0 on X; 
thus also by Fatou’s lemma we have 


162 6 Lebesgue Integrals 
| lim inf (g — fn) du < timint [ (g — fn) du, 
xX nm-70O N—- CO xX 


or equivalently, 


-{ liminf f,dp < lim inf - fa dy. 
x n—oo n—-Cco Xe 
The latter inequality can be rewritten as 
| lim inf f,dy > limsup f find. (6.32) 
xX N00 n—-00 x 


From (6.31) and (6.32) we set 


i) liminf frdu < limint [ tnd 
xX noo noo xO 


< limsup f find ay liminf f,du, 
ox: ae n—co 


n—- co 


which clearly indicates that 


timint [ fad = liasup dnd, 
n—oo Jy n—-00 x 


so that the limit lim, fy fndp exists and is equal to fy limpooo fndu = 
ye x fdu. This completes the proof of the theorem. d& 


6.3.5 Fubini Theorem 


For a function of several variables, we may define the Lebesgue integral by 
exactly the same process as for a function of one variable. In cases of two 
variables, for instance, a rectangle S = [a, b] x [c,d] takes on the role of inter- 
vals, and we need only to imitate the definitions and methods that we used 
for functions of a single variable. We can develop the theory for the entire 
plane R? analogously to that for the real axis R. In fact, all the consequences 
in Sect. 6.2 for Lebesgue integrable functions on a closed interval [a,b] are 
easily carried over to the corresponding propositions for the double integral 
on the rectangle S without modifying the actual proofs in Sect. 6.2, except 
for replacing f(x) by f(z, y). 

However, an important new problem arises here. If f is integrable on the 
rectangle S' = [a,b] x [c,d], we have to determine whether the value of the 


integral 
fy, f(a, y)dxdy (6.33) 


6.3 Important Theorems for Lebesgue Integrals 163 


is equal to that of the repeated integrals 


[ [rene dy and a [rena a. 


This is true for continuous functions on S. But it is far from obvious that 
the existence of the double integral (6.33) guarantees the existence of either 
repeated integral. 

The following example may lead the reader to consider the point mentioned 
above. 


Examples Assume the function 


oe Grate for (0,9) # (0,0), 


(6.34) 
0 for (x,y) = (0,0), 


and compute the repeated integrals 


es =[owy [fre nae] sha lay = ff a [f renea), 


Straightforward calculations yield 


1 1 1 
O y dx T 
: [ «| Oy (ate) I m+1 4 
1 1 1 
6) —£ —dy T 
es ds) et ae ae 
d i, vf oc (aaa) © han 4 


Hence, we conclude that 


and 


Leg Flips 


which indicates that the order of integrations with respect to x and y cannot 
be changed. 


We now present the main theorem of this subsection. 


@ Fubini theorem: 
Let the function f(x) be integrable on a rectangle S = [a, }] x [c, d]. Then 
the following equalities hold: 


| [ seneean = | i “seo ay= f i : Heady oe 


164 6 Lebesgue Integrals 


According to Fubini’s theorem, a double integral [ [, f(x, y)dady is computed 
by integrating first with respect to x and then with respect to y, or vice versa. 
We omit an exact proof of the Fubini theorem, since it requires rather lengthy 
arguments regarding the existence and the convergence of the double integrals. 
Instead, we present some applications of the theorem. 

The following is an extension of the Fubini theorem: 


@ Fubini—Hobson—Tonelli theorem: 
Let the function f(x) be defined on S' = [a, 6] x [c,d]. Then, if either of 


the repeated integrals 


i [ven dx or a  [ ireonta dy 


exists, f is integrable on S and, hence, 
by pa all pe 
| [ senacay = | i Heady inf i Hea) dy. 


Both the Fubini theorem and the Fubini-Hobson—Tonelli theorem for integrals 
on a rectangle S may be easily extended to integrals on all of R? or to the 
integrals on any measurable subsets of R?. 


Exercises 


1. Suppose that the function 
Gn(x) = ~Ik2 xe 7k 2" cs 2(k 4 1)2xe7 (+072 


is defined on [0, co), and form the sum 


fn(x) = Ms 9x (x) = Dee?" 4 2(n + 1)2xe7 td)? 
k=1 


Show that [5° limpoo fn(2)de # limnsoo fy” fn(x)dz. 


Solution: We have 


i Jim fn(ax)dx = ‘a (-2ze-*") dx = le] =-1l, 


whereas 


lim fn(x)dx = lea - pre =). 


n—Co 0 


Therefore, (6.23) is not valid. & 


6.3 Important Theorems for Lebesgue Integrals 165 


2. Given the function: 
<j minnie tor. 0 <a <7 n, 
ne) = {5 for t/n<ac<r7, 
show that : 
lim ee x)dz # if lim fn(a)dx 
0 n—- oo 


nm— oo 


Solution: We have limy_.oo f(z) = 0 for every x in [0,7] and 
limn—oo fe fn(x)dx = 2. Hence, we obtain the desired result. & 


3. Suppose that the nonnegative functions {f,(z) : n € N} are each 
summable over a measurable set X, and fp, < fn41 on X. Show that 
the limit function f = lim,_... fy, is summable over X and that 


lim ee | fdu. 
x 


n—- oo 


Solution: Let gn = fi — fn, so that 0 = 91 < go <--: < fi. Thus, 
the dominated convergence theorem ensures that limp +o gn = 


fi—f is integrable, and we have lim i (fi — fn)du = if (fi — fd, 
which gives 


[i fide jim, fadu= f (= Aa 


Further, as f is integrable since 0 < f < f;, we have 


[i dan= f fidu— fran. 
so that 


[fae fim, f tod =f fidu— fra 


which gives 


in | ee | fan aa 


4. Examine the applicability to integrals ie fn(x)dx of dominated and 
monotone convergence theorems for the following: (i) f,(x) =2n2e-""™; 


(ii) fa(x) = nvem”™ 


166 6 Lebesgue Integrals 


Solution: 
(i) Setting y = nx, we have 


| fn(x)dx =i an2e-” ® da =i 2ne~¥ dy = nr, 
0 0 0 


where the last term diverges as n — oo. Hence, the limp. 
Jo. fn(a)dax does not exist. Next, we observe that for x #0, 


n— oo 


lim fn(v) = lim (awe) =0, 
whereas for x = 0, 
lim f,(0) = lim (2n?) =o. 


Thus, there is no limiting function f = lim,_... fn that sat- 
isfies the inequality f(x) > 2n2e-"°?” for all n in X, and we 
can conclude that neither the dominated nor the monotone 
convergence theorem is applicable. 


(ii) It is found that 


co co 1 5 Co 1 
fn(a)dax =} née" dx = -5e = 5) 
I 0 2 0 2 


and that nze~"™ — 0 pointwise as n — oo. Therefore, the lim- 


iting function f(a) satisfying the inequality f(a) > nae-"® does 
not exist. Hence, neither the dominated nor the monotone conver- 
gence theorem is applicable. & 


5. Using Fubini’s theorem, derive the formula 


De ee 1+ 
| a Pe ae eee a ee) (6.35) 
9 loge l+a 


Solution: Note that the integral in the left-hand-side is beyond 
elementary calculus, so that it is impossible to achieve (6.35) by 
straightforward calculations. Instead, we observe that 


1 b ae a: 
| ax | ody = | oT? de 
0 a 0 log x 


b 1 b 
d 1+b 
i olde = | if = log : 
a 0 a y+ l+a 


and 


6.4 The Lebesgue Spaces L? 167 


Thus, when we apply the Fubini theorem to the double integral 


i —_ 


Pi iN eg ae aid 


we obtain the desired result (6.35). & 


x dady, 
a<y<b] 


6. Show that the function f(x,y) given in (6.34) in Sect. 6.3.2 is not inte- 
grable on [0,1] x [0,1]. 


Solution: It follows that 


2 
—-2@ 
pple td —_ +5 dxdy 
Vid 0<a<y<l cp 
=? 5 dxdy = Ss 
fof cage [FP 


This means that the existence and equality of two repeated inte- 
grals do not guarantee the existence of the double integral. & 


x? —y? 
(a? + y?) 


6.4 The Lebesgue Spaces L? 


6.4.1 The Spaces of L? 


We close this chapter by demonstrating the relevance of the Lebesgue in- 
tegral theory to the functional analysis that we discussed in Chap. 4. The 
Lebesgue theory on integration enables us to introduce certain spaces of func- 
tions that have properties that are of great importance in analysis as well 
as in mathematical physics, in particular, quantum mechanics. These are the 
so-called L? spaces of complex-valued functions f such that |f|? is integrable. 

We have already dealt with the concept of Hilbert space. In fact, L? for 
any measure p satisfies the conditions for a Hilbert space. We begin with a 
short review of the definition of L? spaces in terms of measure, and follow this 
by examining how the spaces possess vector space properties owing to the use 
of the Lebesgue integral. 

Let p be a positive real number and let X be a measurable set in R. The 
L? space is defined as follows: 


@ Definition of L”? space: 
The L? space is a set of complex-valued Lebesgue measurable functions 
f(x) on X that satisfy 


i fay < 05 
xX 


for p > 1. 


168 6 Lebesgue Integrals 


When the integral f,, | f(«)|?dx exists, we call it the p-norm of f and denote 


it by ; 
1/p 
Isle = (fran) | 


Clearly for p = 2, the present definition reduces to our earlier definition of L?. 


6.4.2 Holder Inequality 
The following two inequalities are fundamentals that demonstrate the rela- 


tions between the norms of functions involved in L”. 


@ Holder inequality: 
For any f,g € L” under the conditions 


IBS sil 
pq>1 and -+-=1, 
P 4 


we have 
fg€L* and |fgll <\lfllpllgll- 


Proof We assume that neither f nor g is zero almost everywhere (otherwise, 
the result is trivial). To proceed with the proof, we first observe the inequality 


b 
al/Pyt/4 co 2 4? for ab>0, (6.36) 
P 4 
which we is justify by rewriting it as 
pip< t + = 
Pp 4 


where we set t = a/b. Then, we note that the function given by 


Ot eee 
P @ 
has a maximum at t = 1, namely, 
max f(f) = f(l) =1-=- > =0, 
P @q 


which results in the inequality (6.36), which we use to obtain 


sedate) 2 ae a eat (6.37) 


6.4 The Lebesgue Spaces L? 169 


1/p 1/q 
A=[f irra ana B= | f alten) 


The right-hand side of (6.37) is integrable from the hypothesis that f,g € L”. 
Therefore, using (6.37) we obtain 


1 AP Bud 

a2 | elds | fPdu + | lal@du 

AB xX Pp xX q xX 
eee 


=i+ = 
P 4 


where 


Consequently, we have 
i) lfgldu < AB, 
xX 


which proves the inequality. & 
6.4.3 Minkowski Inequality 


The other inequality of interest is stated below. 


@ Minkowski inequality: 
If f,g € L? with p> 1, then 


f+geL? and |f+agllp <llfllp + Ilgll- (6.38) 


Proof For p = 1, the inequality is readily obtained by integrating the triangle 
inequality for real numbers. For p > 1, it follows that 


| If +9/du = i If + ol Fla 
xX xX 
+f If +9! “lgldp. 
xX 
Let g > 0 be such that 
pq 


Applying the Hélder inequality to each of these last two integrals and noting 
that (p — 1)q = p, gives us 


1/q 
if |f(x) + g(a)|/Pdx < M / fe alla 
x x 


=M | I If + alan 7 (6.39) 


170 6 Lebesgue Integrals 


where M denotes the right-hand side of the inequality (6.38) that we would 
like to prove. Now divide the extreme ends of the relation (6.39) by 


1/q 
ff \f + al? 
xX 


to obtain the desired result. & 


Remark. It should be noted that neither the Holder inequality nor the 
Minkowski inequality holds for 0 < p < 1 if u(X) > 0, which is why we 
restrict ourselves to p > 1. 


6.4.4 Completeness of L? Spaces 


By virtue of the two inequalities discussed above, we can show the com- 
pleteness properties of LZ? spaces, which is crucially important for developing 
Hilbert space theory for Lebesgue measurable functions. 


@ Completeness of L”? spaces: 
The space L? is complete: i.e., for any f, € L” satisfying 


He Fe = Smllp = 0, 


n,m—oo 


there exists f € L? such that 


Jim |lfn — fllp = 0. 


Proof Let {fn} be a Cauchy sequence in L?. Then, there is a natural num- 
ber 1 such that for all n > n1, we have 


1 


In — Frill < 5: 


By induction, after finding np_1 > nz—2, we find nz, > nz_1 such that for all 
n> nz we have 


1 
In — Fall < 55: 


Then { fn, } is a subsequence of {f,,} that satisfies 


1 
anes ~ Soall < 3p 


6.4 The Lebesgue Spaces L? 171 


or 


k=1 


Let 
Gk = [ray aes fal Or Sa Peeing = Peels k= 1,2,-+- : 


Then, by the Minkowski inequality, 


[ keoam= | (fnal + fro — fail to++ + freer — frl)” de 
x xX 


eS Pp 

< Ca of Ss || foes = inl) 
k=1 

< AP <~@. 


Let g = limg,. Then g? = limg?. By the monotone convergence theorem 
given in Sect. 6.2.1, we have 


[ vau= lim, f gdje <ov, 
x k->00 Jx 


which shows that g is in L?, and hence 


le) P 
| Cate - ful dx < ow, 
= k=1 
implying that 
frst + S—\fruys — Fre 
k=1 


converges almost everywhere to a function f € L?. 
It remains to prove that || fn, — f|| ~ 0 as k > oo. We first note that 


Co 


f(2) _ nj (x) = Fe: (x) i? fax (x)| . 


k=) 
It then follows that 
t= Fj = 3 Il Froe+1 =a Fx llp < = 5k = 55-1" 
k=j k=j 
Therefore, ||f — fn,||p > 0 as j — oo. Now 


Ilfn — fllp S Wfn — fall + I fnn — fillps 


where ||f, — fnellp —~ 0 as n — oo and k — oo and thus |f, — fllp = 
0 as n — oo. This shows that the Cauchy sequence {f,,} converges to f 
in L?. & 


172 6 Lebesgue Integrals 


Before closing this chapter, we must emphasize that if we employ the Riemann 
integral to construct L? spaces, the theorem mentioned above breaks down so 
that we can no longer expect completeness of the resulting function space. To 
illustrate this point, we temporarily define the ‘L! space’ by a set of Riemann 
integrable functions under the ‘l-norm’: 


be 
i= f eee 


We then consider a function 


_ f1. fora € {an}, 
fn(a) = . otherwise, 


where {a,} (n = 1,2,---) is an infinite sequence of all rational numbers 
in [0,1]. It readily follows that a function f,(x) — fy(v) € L! is Riemann 
integrable and reads 


Ife — fol = | fal) — fa(a)|de = 0. 


Nevertheless, f,(x) converges to Dirichlet’s function y(a), which is not 
Riemann integrable as noted earlier. As it is impossible to examine the quan- 
tity 
R 
fn — x1, 


using Reimann integrals, we cannot establish the complete function space 
based on that method. 


6.5 Applications in Physics and Engineering 


6.5.1 Practical Significance of Lebesgue Integrals 


From a practical viewpoint, what makes Lebesgue integrals so important is 
the fact that they allow us to interchange the order of integration and other 
limiting procedures under very weak conditions, which is not possible in the 
case of Riemann integrals. In fact, in the case of Riemann integrals, the iden- 
tities 

co 


lim fale =f Jim, fn(x)dx 


n—Cco 
OO. 


and 


are valid only if the integrands on the right-hand side, i-e., lim f, and > fn, are 
uniformly convergent. Such a restriction can be removed by using a Lebesgue 


6.5 Applications in Physics and Engineering 173 


integral since with the latter, only pointwise convergence of the integrand 
is needed. We saw in Sect. 6.3 that the Lebesgue convergence theorem 
and Fubini’s theorem markedly weaken the conditions necessary for the 
validity of an interchange of the order of integration. As a result, we need 
not monitor the order of the limiting procedure, which is very useful in the 
practical calculations encountered in physics and engineering. 


6.5.2 Contraction Mapping 


Another important consequence of Lebesgue integral theory is the complete- 
ness of the function space L? spanned by Lebesgue integrable functions. 
L” spaces have a wide range of applications in physics, statistics, engineering, 
and other disciplines. For instance, they serve as a basis in the development of 
a rigorous theory of Fourier transformation, in which the mappings between 
two different L? spaces are considered. Moreover, the theory of quantum me- 
chanics is established on the basis of the L? space, a specific class of L? spaces 
with p = 2. In both applications, the completeness property of the L? space 
plays a crucial role in making the theory self-contained. In order for the reader 
to learn more about this issue, we present the contraction mapping theo- 
rem (or Banach’s fixed point theorem) below. This theorem proves the 
existence of a unique solution to a certain kind of equation associated with 
Lebesgue integrable functions, which makes the theory based on L? spaces 
self-contained. 
A preliminary terminology is defined below. 


@ Contraction mapping: 
A contraction mapping T is a mapping from L? onto L? that satisfies 
the relation 
ITF) -T@)ll <elf—-gll O<e<1) (6.40) 


for any f,g € L (see Fig. 6.6). 


IP EP 


LD (o> 


Fig. 6.6. Sketch of a contraction mapping T acting on f,g € L? 


174 6 Lebesgue Integrals 


Remark. If T is regarded as a differential operator acting on a Lebesgue inte- 
grable function f, then we can say that ‘a contraction mapping is a mapping 
that satisfies the Lipschitz condition’ (see Sect. 15.2.3). 


We should keep in mind that the norm ||--- || used in (6.40) is in terms of 
L? spaces, so that || f — g|| =0 means f = g almost everywhere. In plain 
words, a contraction mapping reduces the distance between two elements in 
the L? space. 

We are now ready to move on to the main theorem. 


@ Contraction mapping theorem: 
Let T be a contraction mapping and I be an identity mapping. Then 


the equation 
(T—I)f =0 (6.41) 


has one and only one solution f that belongs to L?. 


| Remark. The solution f of the equation (6.41) is called a fixed point in L?. 


The contraction mapping theorem guarantees the existence and uniqueness of 
fixed points of certain self-mappings and provides a constructive method for 
finding those fixed points. It should be emphasized that the theorem allows 
us to prove the existence (and uniqueness) of solutions of ordinary differen- 
tial equations with respect to Lebesgue integrable functions, as intuitively 
understood if T is set to be a differential operator. 


Proof (of the contraction mapping theorem): For arbitrary fo € 
L?, we introduce a sequence of functions { f;,} defined by 


fi = T(fo), fo =T(f1), ame fn =T(fr=i); aioe 


We shall see below that the sequence {f,,} is a Cauchy sequence 
and thus has a limit f = limp fn. It follows from the definition of 


T that 
fn — fata ll = WP(fn—1) — Tfn—145) 
= C\ fesr = Ja 214 4| 
< <e" fo = fill (6.42) 
and 


Ifo — fall < llfo — fall +--+ + |lfj—a — Fall 
< (ite+---+e*) [fo - fill 


<(1-¢)" [fo - fill, (6.43) 


6.5 Applications in Physics and Engineering 175 


where we used the Minkowski inequality (6.38) with respect to the 
p-norm designated by || --- ||. From (6.42) and (6.43), we set 


In — Fatal $ olf — fal 0. (no). 


This indicates that {fp} is a Cauchy sequence and thus converges to 
a limit (denoted by f) regardless of the choice of fo. Furthermore, the 
limit f always belongs to L? since the space L? is complete. Hence, 
the converging behavior of {f,} to f can be expressed by using the 
concept of the norm of L? as 


lim ||fn— fll =0. (6.44) 
We then obtain 


ITA) — fll SILA) — fall + If — Fil 
= IIT(P) — Tfn-v I + Ifa = Fil 
< lf — frill + Ilfn — fll 70 (n— co), (6.45) 


which means that T(f) = f almost everywhere. Consequently, equa- 
tion (6.41) has at least one solution that is a limit f of the sequence 
{fn} that we introduced. 

The uniqueness of the solution f is readily understood. Suppose 
g © L? such that T(g) = g. We then have 


llg— fll =IF@) - TAI s ellg — fll. 


This means that ||g—f|| = 0 since 0 < c < 1, so we have g = f almost 
everywhere. de 


Remark. Note that it is our use of the Lebesgue integral (instead of the 
Riemann integral) that guarantees the validity of the contraction mapping 
theorem. In fact, if we restrict ourselves to the Riemann integral, the limit f 
of the sequence {f,} may not belong to L?, and we can no longer obtain the 
result (6.45). 


6.5.3 Preliminaries for the Central Limit Theorem 


The effectiveness of Lebesgue integrals is also observed in probability theory, 
particularly in the derivation of the central limit theorem, which plays 
a fundamental role in statistical mechanics and in the statistical analysis of 
experimental data. Later, we shall see that employing Lebesgue integrals is 
necessary for proving the central limit theorem, where the Lebesgue con- 
vergence theorem and Fubini’s theorem are used time and again. 


176 6 Lebesgue Integrals 


In order to prove the central limit theorem, we introduce a random vari- 
able x (see Sect. 6.1.3); for instance, « may be the number of spots we get 
when shooting a pair of dice or a real number that we randomly pick from 
an interval on the real axis. Suppose that x lies in a set X on the real axis. 
(Here, X may be a continuous interval, a set of discrete points, or a union of 
the two.) In modern probability theory, measures characterizing the statisti- 
cal properties of the system considered are defined in terms of the Lebesgue 
integral. For instance, the probability (or distribution) that x is found in 
subset Xo C X is given by 


P(x: «© € Xo) = | pd, (6.46) 
Xo 


where pu is the Lebesgue measure of Xo and p is the probability density as- 
sociated with «. In general, p is assumed to satisfy the normalization condition 
Ae x pdu = 1. We can state that the random variables x and y are independent 
if 

P(x, y) = P(x) P(y). 
Moreover, the variables x and y are said to be identically distributed if 


P(x) = Ply). 


We also define the expected (or mean) value of x and the variance of 
x by the integrals 


B{e}= f epdu and V{0}= f (— Ble})? pd, 


respectively, where jz is the Lebesgue measure of X. In particular, the ex- 
pected value of an imaginary exponent e’**, where z is real, is known as the 
characteristic function. 


@ Characteristic function: 
The characteristic function y,(z) of a random variable x is defined by 


o(z) = Efe**}. 


It can be shown that 


E{e(@twy = ~z(z)~y(z) 

if and only if the random variables x and y are independent. Furthermore, 
we obtain 

if and only if the variables x and y are identically distributed. The latter 
condition is known as the uniqueness theorem for characteristic functions, 
and the proof, which involves Fubini’s theorem, can be found in advanced 
texts on probability theory. 


6.5 Applications in Physics and Engineering 177 
6.5.4 Central Limit Theorem 


We are now ready to state the key theorem. 


@ Central limit theorem: 
Assume a, series of random variables {x,,} in which the x, are indepen- 
dent and identically distributed. For arbitrary a and b (b > a), we have 


ja i — NM ay gene 
jac el 2a era eee -— | eaclede 6.47 
eee (: a aV/n 2 Wir dhe E $ ( ) 


where m = E{z,} and o? = V{an}. 


Briefly, the theorem states that the probability that the average of n random 
variables equals a is proportional to ene /2, (Note that @ is the average of n 
variables and not a variable itself.) A random variable with the probability 
density e~§'/2 ig said to be normally distributed. 


Remark. The central limit theorem is very effective in describing various 
stochastic phenomena in nature since it can be applied regardless of the dis- 
tribution of the n random variables; i.e., almost all classes of random variables 
obey the theorem as long as they are independent and identically distributed. 


An illustrative example of the central limit theorem in physics is the 
Maxwell — Boltzmann distribution of an ideal gas. For a given tem- 
perature T, the distribution f(v) of the velocity of gas molecules v = |v| is 
known to satisfy the equation 


m ale mv? 
= : = 6.48 
sw) =(sea) ew (-255). (6.48) 
where m is the mass of a gas molecule and kg is the Boltzmann constant. Here, 
the velocity v(t;) as a function of discrete time t; (¢ = 1,2,--- ,m) serves as 


n random variables. In general, in an equilibrium state, u(t;) for different t; 
is independent and identically distributed and, thus, if n is sufficiently large, 
the time average of v(t;) obeys the normal distribution described by (6.48). 
Figure 6.7 shows the distribution of the squared velocity of gas molecules, 
which is determined from the formula 47v? f (v2), for various values of T; we 
set kg = 1.38 x 107*2 ke - m?/s?- K and m = 6.6 x 107?” kg by considering 
4He molecules. We observe that the mean value of v? shifts to the right with 
an increase in the temperature, which can be intuitively understood to be due 
to the acceleration of the molecules at high temperatures. 

It is important to emphasize that the central limit theorem holds good for 
any kind of distribution of the n variables {x;} as long as they are independent 
and identically distributed. For example, let us consider n variables that obey 


178 6 Lebesgue Integrals 


2.0r 
[ ——  T=150K 
Pot. oe ee! 300 K 
are ae -"- 1000 kK 
Ef} \ 
D, b . 
ar ‘ 
af : 
ae ‘. 
[ . \. 
05+ ‘ \ 
L ‘ \, 
a ee \, 
0.0 = = 
0 2000 4000 6000 
v [m/s] 


Fig. 6.7. Distribution of square velocity v? of *He molecules 


the distribution P(x) shown in Fig. 6.8. The average of these variables shows 
the distribution depicted in Fig. 6.8, all of which converge to the normal 
distribution as n increases. The fact that the distribution of {x;} can be 
disregarded is the reason the normal distribution is so universally observed in 
a wide variety of stochastic phenomena. 


6.5.5 Proof of the Central Limit Theorem 


As some further points have to be discussed in order to prove the central 
limit theorem, we present below only an outline and not a rigorous proof. 
Let us emphasize that the use of Lebesgue integrals is necessary for proving 
the central limit theorem, and the Lebesgue convergence theorem and 
Fubini’s theorem are used time and again. 


Proof We have only to consider the case of m = 0 and a = 1; otherwise, 
the new variable %, = (ap — m)/o is introduced to yield E{Z,} = 0 and 
V{z,} = 1. The characteristic function yy, (z) for the variable 


Yn Jn 


is given by 


hee - z 
Py, (2%) = E 4 exp Ft =[[ +, (=z). 
j=l 


6.5 Applications in Physics and Engineering 179 


0.754 
1 mean of x, = 0.125 
variance of x, = 0.9 
= 0504 
So 
= 
2 
5 
2 
0.28 n-ne ponn nnn nner 
ra 4 
0.00 T T 1 
—15 -10 —0.5 0.0 0.5 1.0 1:5 2.0 
Random variable x, 
c 
150, ) 
» 
=z 
xe} ¢| 
£ 1004 I 
oO | 
£ 
(b) : 
b hal ai 
150, 2 
5 504 
3 | 
3 { 
a 
1004 
0 
-1.0 0.5 0.0 0.5 1.0 


f : (i=1,2,... 1000 
150, mean of {x}: (i=1,2, ) 


s 
Distribution of the mean of {x} 


100+ 


50+ 


Distribution of the mean of {x} 


“1.0 05 0.0 05 1.0 
mean of {x,}: (i=1,2,... 10) 


Fig. 6.8. Top: Distributions of a random variable x. Bottom: (a)—(c) Distributions 
of the average value a of n random variables 71, 22,--- ,% with n = 10 for (a), 
n = 100 for (b), and nm = 1000 for (c). For each, 1000 a’s are sampled to create 
the distribution. With increasing n, the distribution of a converges to the normal 
distribution around the center of 0.125 as expected 


in which the condition that all x, are independent allows us to obtain the last 
expression. Furthermore, since all x, are identically distibuted, we have 


I (a) = le a] 


which is ensured by the uniqueness theorem discussed in Sect. 6.5.5. 


180 6 Lebesgue Integrals 


We want the limit of yy, (z) at n — oo, so we use the formula (see the 
lemma, below) 


lim gy, (z) = ese, (6.49) 


nm— Oo 


in which the right-hand side is the characteristic function of a normal dis- 
tribution. The result of (6.49) together with the continuity theorem (see 
below) states that 


b 
lim P(a < yn, <b) = Pla<y<bdj)= a et dy, & (6.50) 
T Ja 


n—0o ee oe = 


The following theorem forms the basis for the proof of the central limit 
theorem. 


@ Continuity theorem: 
Let x and x, be random variables such that 


lim ¢, (2) = ¢2(2). 


n—oo 


We then obtain 


for arbitrary a, b(b>a) satisfying P(a = a) = P(a# = b) = 0. 


This theorem states that the convergence of characteristic functions implies 
the convergence of the corresponding distribution functions. Since the proof 
requires the use of Fubini’s theorem as well as the Lebesgue convergence 
theorem and is quite complicated, we do not present it. 


@ Lemma: 
If E{x} = 0 and E{a?} = 1 for a random variable x, then the charac- 
teristic function y,(z) satisfies the relation 


Jim, ee (=)| f= exp (-3) (6.51) 


6.5 Applications in Physics and Engineering 181 


Proof The assumption that E{xz?} = 1 < oo implies that y,(z) is twice 
differentiable. In fact, we obtain 


~o(z) = Efe**}, 


/ ae d 1zZx — G 1Zx 
alte) = BL Fe \ < Bfive , 


xe 
Yo" (z) = Bi sel _ E {-27e"*}, 
where the Lebesgue convergence theorem was used to interchange the 
order of differentiation d/dz and integration [ dx associated with calculation 
of E{.--}. The twice differentiability of y(z) allows us to expand it around 


z=O0as 
z az 


os (Fz) = vel0) + Fees'(0) + Hoe"). 


where 7 is small enough to be || < |z|/./n. Since y, (0) = 1 and ¢,’(0) = 
E{ix} = 0, we have 


ws lee) me) 


2 


z 
= 1 ee all 
nog ( Toe ) 


a I uP / 2 
SO p= 8 Or ee le Sd); 
2 8n 


where we used the inequality |y’’(7)| < 1 to expand the logarithmic term for 
n> 1. As a result, we set 


n 2 2 
; pay eet (i) 
Jim log le(=)| ee) i 


which is equivalent to (6.51). &. 


Part III 


Complex Analysis 


7 


Complex Functions 


Abstract Differentiation and integration of complex functions are significantly dif- 
ferent from those of real functions. In this chapter, we show that two very impor- 
tant theorems—the Cauchy theorem (Sect. 7.2.2) and the Taylor series expansion 
(Sect. 7.4.3)—result in a broad range of mathematical consequences that are highly 
relevant and useful in mathematical physics. However, before moving on to the 
principal discussion, we deal with the underlying concepts of analytic functions 
(Sect. 7.1.2) and the geometric meaning of analyticity (Sect. 7.1.5). 


7.1 Analytic Functions 


7.1.1 Continuity and Differentiability 


This chapter describes the theory of functions of a complex variable. Let C 
denote the set of all elements z of the form 


Z=aU+1, 


where x,y € R and i is a familiar symbol defined by i? = —1. Let D be a 
domain in C’. Then, a complex function defined by 


f: DC 


is arule that assigns a complex-valued function f(z) to each z € D. This f(z) 
is equivalent to an ordered pair of real-valued functions u(z) and v(z). Thus, 
f(z) can be written in the form 


w= f(z) = u(z) + tv(z). 


The real-valued functions u(z) and v(z) are called the real and imaginary 
parts (or components) of f(z) (see Fig. 7.1). We may write u = Ref and 
v=Imf. 


186 7 Complex Functions 


y=lmz a 


Fig. 7.1. A complex function w = f(z) that assigns a point on the w-plane to each 
point on the z-plane 


Once we introduce complex functions, the concepts of differentiation and 
integration encountered in ordinary real calculus acquire new depth and sig- 
nificance. When f(z) has its derivative in D, it is referred to as an analytic 
function in D. (More precise definitions of analytic functions are given in 
Sect. 7.1.3.) We shall see that the conditions for a complex-valued function 
f(z) to be differentiable with respect to a complex variable z is much stronger 
than that for a real-valued function f(a) with respect to a real variable z. 
This restriction forces a great deal of the structure of f(z). 

An exact definition of an analytic function is obtained by considering its 
derivative with respect to a complex variable z. Therefore, our first task is to 
determine the necessary and sufficient conditions for a complex function f(z) 
to have a derivative with respect to z. Before stating what is meant by the 
derivative f’(z), we begin with the definition of continuity for f(z). 


@ Continuity of complex functions: 
Let f : D — C be a complex function and zp a point in D. Then, a 
function w = f(z) € C is continuous at the point zo if 


lim f(z) = f(z). (7.1) 


Z—Z0 


In the limit of (7.1), the complex variable z may approach zo from any direc- 
tion in D (see Fig. 7.2). Hence, if we say the limit (7.1) exists, it means that a 
unique quantity f(z9) must result from the limiting process regardless of how 
the limit z — zo is taken. 

A similar feature is found in the definition of the derivative of f(z). 


@ Derivatives of complex functions: 
A complex function f(z) is said to be differentiable at the point zo if 
and only if the limit 


7.1 Analytic Functions 187 


(7.2) 


exists and is uniquely determined regardless of the manner in which z 
approaches zo. When the limit exists, we denote it by f’(z0), the derivative 
of f(z) at Zo. 


Fig. 7.2. Approaching direction of z to zo 


The definition(7.2) requires that the ratio [f(zo + Az) — f(zo)|/Az always 
tend to a unique limiting value, no matter the path along which z approaches 
zo. This is an extremely strict condition; in fact, a number of theorems in the 
theory of analytic functions are derived from this requirement. 

Keep in mind that a function f(z) may be differentiable only at a point, 
or on a curve, or through a region. An example for a differentiable function 
at single point is presented in Example 3 in Sect. 7.1.2. 


7.1.2 Definition of an Analytic Function 


Among many differentiable functions, some specific kinds of functions form 
the class of analytic functions as stated below. 


@ Analytic functions: 
A function f(z) is said to be analytic at the point z = Zo if and only if 
it is differentiable throughout a neighborhood of z = zo. 


188 7 Complex Functions 


Remark. There are some synonyms for the term analytic: holomorphic, 
regular, and regular analytic. 


We offer some comments on the distinction between differentiability and 
analyticity. As noted above, the conditions for f(x) to be analytic are more 
stringent than those for it to be differentiable; in fact, a function f(z) is said 
to be analytic at a point zo if it has a derivative at zo and at all points in 
some neighborhood zp. In this context, if we say that a function is analytic on 
a curve, we mean that it has a derivative at all points on a two-dimensional 
narrow strip containing the curve. If a function is differentiable only at a 
point or only along a curved line, then it is not analytic so that we say it 
is singular there. A typical example of f(a) that is differentiable only at a 
point is demonstrated in Example 3 below. 


Examples 1. The function f(z) = z” is differentiable and analytic every- 
where. In fact, the limit 


(zo + Az)” — 2} 


i 
Az—10 Az 
—1 = 
= lim |nz t+ eS Ay prcccantiAg ye" es Ni 
Az—0 2 


exists for arbitrary zo, and is clearly independent of the path along which 
Az — 0. This means that any polynomial in z is differentiable and ana- 
lytic everywhere. 


2. The function f(z) = 2* is neither differentiable nor analytic anywhere, 
since the limit yields 


= lim (7.3) 


If Az — 0 parallel to the real axis, then Az = Az* = Az so that the limit 
equals 1. However, if Az — 0 parallel to the imaginary axis, then Az = 
iAy = —iAz* so that the limit equals —1. Therefore, the quantity (7.3) 
depends on the path Az — 0, which means that it is neither differentiable 
nor analytic anywhere. 

3. The function f(z) = |z| is differentiable only at the origin. In fact, 


f (20 + Az) — f(zo) = (20 + Az) (2 + Azq) — 2025 


zoAz* + 2Az — AzAz*, 


I 


I 


which yields 


ina f (0 + Az) — f (20) ee zoAz* + 24Az— AzAz 
Az—-0 Az Az—-0 Az 


= czo + 2, 


7.1 Analytic Functions 189 


where c = limy,-.9(Az*/Az) is a complex-valued constant that depends 
on the path of Az — 0. Hence, the limit noted above is uniquely deter- 
mined only when zo = 0, which means that the function f(z) = |z| is 
differentiable only at a point z = 0. 


7.1.3 Cauchy—Riemann Equations 


Let f : D— C with f(z) = u(z) + iv(z) as usual. We give the necessary and 
sufficient conditions for a function f(z) = u(x, y) +iu(a, y) to be differentiable 
at a point zo € D. Let us assume that f(z) is differentiable at zo € D. Then 
we have 


Af Au Av 
a _ ‘. ae a ied erat 
EO a (= - iz) 
Since f’(20) exists, it is independent of the path Az — 0; i.e., it is independent 
of the ratio Ay/Az. If the limit is taken parallel to the real axis, Ay = 0 and 
Az = Ax, we have 


oN ss i eA a SOE ee 

01 veo Ag Ar) Ox da’ 

On the other hand, if the limit approaches the point zp along the line parallel 
to the imaginary axis, Ax = 0 and Az = iAy, then 


Av Au _ Ov Ou 
Ay (Ay) dy ‘dy’ 


From the initial assumption, these two limits must be equal, so equating real 
and imaginary parts gives us 
du _ 9 yg OH __ ov oe 
Ox Oy Oy Ox 
Equations (7.4) are known as the Cauchy—Riemann relations (abbreviated 
by CR relations), and they are a necessary condition for differentiability. 
However, alone they are not sufficient, as they provide only necessary con- 
dition. This is because they were determined from special cases of the re- 
quirement of differentiability as demonstrated above. In fact, the sufficient 
conditions for the differentiability of f(z) at zo consist of the following two 
statements: 


@ Theorem: 
A function f(z) is differentiable at zo € D if and only if 


(i) the first-order partial derivatives of u(x,y) and v(x,y) exist and are 
continuous at zo, and 
(ii) those derivatives at zo satisfy the CR equations. 


190 7 Complex Functions 


Proof We prove that conditions (i) and (ii) imply the differentiability of f(z) 
at zo € D. (The converse was proven implicitly in the beginning of this sub- 
section.) From hypothesis (i), the functions u, Ou/Ox, and Ou/Oy are all 
continuous at the point z = xp + tyo, so we have 


Au = u(x + Ax, yo + Ay) — u(x0, yo) 
Ou Ou 


aE Bye | reer eaAy, (7.5) 


in approximation of the order Ax and Ay. In (7.5), the partial derivatives 
are equated at the point (xo, yo), and the real numbers ¢; and ¢2 vanish as 
Az, Ay > 0. Using a similar formula for v(a, y), we have 


Af = f(zo + Az) — f(zo) = Au+iAv 
3) 
a OU + oN Ay + 6, Ax + e2Ay 
Ox Oy 
+4 OU + OR iy + e3Ar + e,Ay |. 
Ox Oy 

Using the CR equations that are supposed to hold at the point (xo, yo) from 
assumption (ii) above gives us 


0 0 
Ay= (5 + ix) (Az + iAy) + Ar(e, + ie3) + Ay(eo + ies). 


Dividing the both sides by Az = Ax + iAy yields 


Af Ou _ Ov Sree _ Ay 
Ae ae iz +(e, 4 ies) FT, + (€2 + tea) Z. (7.6) 
Since |Az| = /(Az)? + (Ay)?, we have 
|Az| <|Az| and [Ay| <|Azl, 
so that 
Ax Ay 
pease IP Niels ; 
EI <1 and el <1 (7.7) 


Hence, it follows from (7.7) that the last two terms in (7.6) tend to zero with 
Az — 0 because limy--.0 En = 0 (1 <n < 4). As a result, the limit 


Af Ou dv 


ay Az ~ 92 te co 


is independent of the path of Az — 0, so the derivative f(z) exists. We 
thus have verified that f(z) is differentiable at zo if conditions (i) and (ii) are 
satisfied. This completes the proof of the analyticity of f(z). & 


7.1 Analytic Functions 191 


Examples 1. Regarding the function 


f(z) = 2? = (2? — y”) + i(2zy) = us iv, (7.9) 
we have 3 3 3 3 
UL 7) Uv UL 


These equations mean that everywhere in the complex plane the CR rela- 
tions hold and the partial derivatives are continuous. Hence, the function 
(7.9) is analytic in the entire complex plane. Such analytic functions are 
called entire functions. 

2. We saw in Sect. 7.1.1 that the function f(z) = |z|? = 2? + y? is not 
analytic anywhere since it is differentiable only at the origin. In fact, it 


yields 


uy, Oty, Ov _ du 
Ox ‘py oe Ox Oy’ 


which satisfy the CR relations only at the origin. 


7.1.4 Harmonic Functions 


The CR relations immediately provide one remarkable result that points to 
connections with physics. Provided that the CR relations hold in a region, we 
ee d du 0dv_ 0a Adu Sad 

Ox Ox Ox Oy OyOx ~~ Oy Oy (elt) 
Here we assume the continuity of the second-order partial derivatives of u(x, y) 
and u(#,y), which allows us to interchange the orders of differentiation in 
the mixed partial derivatives in (7.11). (This qualification, however, can be 
dropped since the second-order partial derivatives of an analytic function are 
necessarily continuous as we prove later.) Equation (7.11) yields the Laplace 
equation: 


Ou. 0% , 
In the same way, it follows that 
Vu =0. 


Thus we set the following theorem: 


@ Theorem: 
Each of the real and imaginary parts of analytic functions satisfies the 
two-dimensional Laplace equation. 


Any function ¢ satisfying V?@ = 0 is called an harmonic function. Accord- 
ingly, if f = w+iv is an analytic function, then u and v are called conjugate 


192 7 Complex Functions 


harmonic functions since V?u = V?v = 0 holds. The fact that real and 
imaginary components of analytic functions satisfy the Laplace equation plays 
a crucial role in solving applied second-order partial differential equations. De- 
tail discussions on this point are presented in Sect. 9.4.3. 


7.1.5 Geometric Interpretation of Analyticity 


To gain in-depth insight into the nature of analytic functions, we reveal the 
geometric meaning of “analyticity.” We know that the analyticity of f(z) 
within a domain D ensures the existence of the derivative f’(z) = df/dz 


defined b 
ae 5s otc CMA 
2) = jy FEEL 


This suggests that at a point zo within D, 
f(zo +h) — f(zo) = f’(zo)h (7.12) 


for an arbitrary complex number h the magnitude |h| is sufficiently small. 
Let us consider the geometrical meaning of (7.12). For the discussion to 
be concrete, we assume, for the moment, that the derivative f’(z) takes the 


values 
=] 
f(z) =1+i and f’(a) = ee 
at the points zg and z, in D. It then follows that 
fi'(z)h =(1+ah = V2 (= + 5) h = V2e'"/*h, (7.13) 
: V2 V2 


where h = |h|(cos @ + isin @) is a complex number having a certain argument 
6. Equation (7.13) means that f’(zo)h is obtained through the rotation of the 
vector h by 1/4 followed by multiplication by 2. (Note that any complex 
number can be regarded as a vector on the two-dimensional complex plane.) 


Similarly, we have 
fi(ajh= eh, (7.14) 


which states that f’(z,)h is obtained through the rotation of h by 27/3. The 
processes are schematically illustrated in Fig. 7.3. The vector h is depicted by 
thin arrows and the corresponding vectors f’(z)h by thick arrows. Noteworthy 
is that the magnitude |f’(z)h| at both zo and 21, is invariant no matter what 
direction the vector h takes; indeed it follows from (7.13) and (7.14) that 


[f"(zo)h| = V2|h| and | f!(z1)h| = |Al. 


Hence, when the direction of h is shifted by increasing 6, |f’(z)h| remains 
unchanged so that the front edge of the vector f’(z)h moves along a circle 
centered at the origin. 


7.1 Analytic Functions 193 


0 


Fig. 7.3. Illustration of analyticity of f(z) at zo. An infinitesimal circle on the z- 
plane centered at an analytic point is mapped to a circle on the w-plane with slight 
modulation 


Now we go back to (7.12), which says that if f(z) is analytic at zo, the 
acquired vectors f’(zo)h given above are almost equal to the vectors f(z + 
h) — f(zo). This implies that the magnitude |f(zo + h) — f(zo)| is almost 
invariant to the change in the direction of h characterized by 6. Thus as 0 
increases, the front edge of f(zo +h) — f(z) should trace a circle centered at 
the origin. (To be precise, the radius may be subjected to a slight fluctuation, 
as shown in Fig. 7.3, owing to contributions from higher-order terms than h?.) 
In other words, since f(zo) is fixed, an increase in @ from 0 to 27 results in 
movement of f(zo +h) along the circle centered at f(zo). This means that for 
analytic functions f(z), the change in the magnitude of f for an infinitesimal 
change in z is isotropic. This isotropy is the geometric interpretation of the 
analyticity of f(z). 

Better understanding can be attained by considering the case of nonana- 
lytic functions. Let us use the same argument for the function 


f(2)=2? +iKy, (7.15) 


where u = x? and v = y. This function is not analytic, since it does not satisfy 
the CR relations. Indeed, 


except at x = 1/2. For such a nonanalytic function, the isotropy regarding the 
magnitude of the difference | f(z +h) — f(z)| for infinitesimal h breaks down, 
as is shown below. Once we set h = |h|(cos+isin@) with |h| = ¢ = const, 
we have 


f(zo +h) = (ao + € cos 6)? + i(yo + €sin 6) 
~Y xe + 2ecos@- ao + iyo + ie sind 
= f(zo) + 2ecos6- ap + iesing, (7.16) 


194 7 Complex Functions 


« 
° 
oe = 
e 


0 u 


Fig. 7.4. Schematic illustration of nonanalyticity. When f(z) is not analytic at 
z = z, then an infinitesimal circle centered at zo is mapped to an ellipse so the 
isotropy breaks out 


up to the order of ¢. Equation (7.16) indicates that when 0 increases, the front 
edge of the vector f(z9+h) moves along an ellipse that has a major axis of 2a9¢ 
and a minor axis € (see Fig. 7.4). That is, the magnitude | f(zo +h) — f(z0)| is 
no longer isotropic, but depends on the direction of h (except for the particular 
case of rp = 1/2). 


Exercises 


1. Show that f(z) is continuous at 2 if it is analytic there. 


Solution: From the identity, we have 


f(zo + Az) — f(z0) 
Az 


f(z) — fo) = f(zo + Az) — f(zo) = Az: 


and with the definition Az = z — zo, we set 
sim, [f (zo + Az) — f(z0)] = (Jim, Az) f"(z0) = 0. 


Moreover, if we write f(z) = u(z) +iv(z), it follows that u(z) and 
v(z) are both continuous. & 
2. Express the Cauchy—Riemann relations in polar coordinates (r, 4). 


Solution: By imposing z = x + iy = re’, we transform the 


partial derivatives in terms of x into 0/Ox% = (Or/Ox)(0/Or) + 
(00/0x)(0/00). After some algebra, we obtain 0/Ox = cos 0(0/Or)— 
(sin 6/r)(0/06), which, together with the same procedure with re- 
spect to 0/Oy, yields the polar form of the CR relations as 


Ou 10v Ou Ov 
ar roo’ 00. Or 


Their abbreviated forms read u, = vg/r and ug = —Trv;. & 


7.2 Complex Integrations 195 


3. If f(z) is analytic in a region D and if |f(z)| is constant there, then f(z) 
is constant. Prove it. 


Solution: If |f| = 0, the proof is immediate. Otherwise we have 
wt+uy%=cH£0. (7.17) 
Taking the partial derivatives with respect to x and y, we have 


UU, + vv; = 0 and uuy + vvy = 0. Using the CR relations, we 
obtain wuz — vuy = 0 and vuz + uuy = 0, so that 


(u? + v*)uz =0. (7.18) 


From (7.17) and (7.18), and from the CR relations, we conclude 
that uz = vy = 0. We can obtain uy = ve = 0 in a similar manner. 
Therefore, f is constant. d& 
4. Let $(a,y) and (x,y) be harmonic functions in a domain D. Show that 
if we set u = ¢, — wy, and v = ¢z — Wy, the function f(z) = u+ iv with 
the variable z = x + zy becomes analytic in D. 


Solution: It follows that uz — vy = (dy — Vex) — (bry — Yyy) = 
—V7, where Wyr = Wry was used. Since V7 = 0, we have uz = 
vy. Similarly, we obtain uy = —v,z. Hence, u and v satisfy the CR 
relations in D, which indicates the analyticity of f on D. & 


7.2 Complex Integrations 


7.2.1 Integration of Complex Functions 


We now turn to the integration of functions f(z) with respect to a complex 
variable z. The theory of integration in the complex plane is just the theory 
of the line integral as defined by 


a2 N 
a F 
aL 3 i= 


Here (Az;) is a sequence of small segments situated at z; of the curve that 
connects the complex number qa, to the other number a2 in the z-plane. 
Since there are infinitely many choices for connecting a1 to ag, it is possible 
to obtain different values for the integral for different paths. 


Examples Assume the contour integral 


I= 4 Zz dz 
C; 


a 


from z = 1 to z = —1 along the three paths (see Fig. 7.5): (i) the unit circle 
centered at the origin in the counterclockwise direction, designated by C4; 
(ii) that in the clockwise direction, denoted by C2; and (iii) the real axis, C3. 


196 7 Complex Functions 


Fig. 7.5. Three paths. Ci, C2, and C3 


(i) The values of z on the circle are given by z = e’”, so dz = ie’’d0. Thus, 
I(C)) -¢ zdz =} e je dd = ri. (7.19) 
C1 0 
(ii) In a similar manner as in (i), we have 
I(C) -$ edz= i e ie dd = —ni. (7.20) 
C2 0 
(iii) On the real axis, z = x and dz = dz so that 


-1 
I(C3) = ¢ edz= | adx = —2. (7.21) 
3 


In general, complex integrals on the path C’ possess the following property: 


@ Darboux’s inequality: 
Contour integrals on a path C' satisfy the relation 


iE f(z)dz 


where M = max|f(z)| on C and L is the length of C. 


< ML, (7.22) 


7.2 Complex Integrations 197 


This property is very useful because in working with complex line integrals it 
is often necessary to establish upper bounds on their absolute values. 


Proof Recall the original definition of complex integrals: 


- f(z)dz = dim D7 f(z) Aen. 
¢ k=1 


It follows that 


So f(z) Az 


k=1 


< So F(z%)| |Azel < M7 |Aze| < ME, 
k=1 


k=1 


where we have used the facts that |f(z)| < M for all points z on C, that the 
>> |Az,| represents the sum of all the chord lengths joining z,_; and z,, and 
that this sum is not greater than the length of C. Taking the limit of both 
sides, we obtain the desired inequality (7.22). & 


7.2.2 Cauchy Theorem 


We are now in a position to proceed with the key theorem in the theory of 
functions of a complex variable. Consider the complex integral 


I(C;) = ¢ sin zdz 


along the closed paths C; (i = 1,2,3) shown in Fig. 7.6: (a) C, = OP, (b) 
Cz = OQ + QP, (c) C3 = OR+ RP. After some algebra, we obtain 


Fig. 7.6. Three paths: OQP, OP, and ORP 


198 7 Complex Functions 


I(C1) = (C2) = (C3) = ..., 


which gives us the possibility that the integral from O to B remains invariant 
in quantity for our choices of integration paths. Actually, this is entirely true; 
it depends only on the two endpoints O and B. This peculiarity of integration 
comes from the fact that the integrand sin z is analytic on the integration 
paths in question. (In fact, it is analytic everywhere on the complex plane.) 
This result can be generalized to the following statement, called Cauchy’s 
theorem, which is pivotal in the theory of complex function analysis. 


@ Cauchy’s theorem: 
If f(z) is analytic within and on a closed contour C, then 


¢ f(z)dz =0. (7.23) 
Cc 


The somewhat lengthy discussions that are needed for a proof of Cauchy’s 
theorem, are beyond the scope of this textbook, but two immediate corollaries 
of the theorem are listed below. 


@ Path independence: 
If f(z) is analytic in the region R and if contours Cy and C2 lie in R 
and have the same endpoints, then 


ji@e= fdz. 
Cy C2 


The proof readily follows by applying Cauchy’s theorem to the closed contour 
consisting of Cz and —C; as shown in Fig. 7.7: 


Pe ae a 


Intuitively, the symbol —C denotes the contour C' traced in the opposite 
direction. A discussion on the path independence follows the theorem below. 


@ Uniqueness of the integral: 

If f(z) is analytic within a region bounded by a closed contour C,, then 
the integration es f(2)dz along any contour within C depends only on 2 
and 22. 


This theorem states that an analytic function f(z) has a unique integral not 
only a unique derivative. From a practical viewpoint, this theorem is frequently 


7.2 Complex Integrations 199 


Fig. 7.7. Two integration paths: C; and C2 = —C; 
used in the evaluation of contour integrals, since it allows us to choose an 
appropriate contour. 


Remark. When integrating along a closed contour, we agree to move along 
the contour in such a way that the enclosed region lies to our left. An inte- 
gration that follows this convention is called integration in the positive sense. 
Integration performed in the opposite direction acquires a minus sign. 


7.2.3 Integrations on a Multiply Connected Region 


0) x 0) x 


Fig. 7.8. Left: A simply connected region. Right: A multiply connected region 


We should note that Cauchy’s theorem applies in a direct way only to sim- 
ply connected regions. A region R is said to be simply connected if every 


200 7 Complex Functions 


closed curve in R can be continuously contracted into a point without leav- 
ing R. Otherwise, it is said to be multiply connected; (see Fig. 7.8). The 
physical reason for this restriction is easy to find. The important fact is that 
Cauchy’s theorem is a restatement that no singular point is included within 
the region bounded by the contour C’. If the region R bounded by C is multi- 
ply connected, it becomes possible to put on singular points within the closed 
contour C but surely outside the region R in question. In this case, Cauchy’s 
theorem no longer holds even though the integrand f(z) is analytic everywhere 
in the region. 

Nevertheless, there is still a way to apply Cauchy’s theorem to multiply 
connected regions, which is based on allowing the deformation of contours as 
described below. 

Suppose that f(z) is analytic in the region that lies between two closed 
contours C' and C’, where C encloses C’. Draw two lines AB and EF close 
together, so as to connect the two contours. Then ABDEFGA described as 
shown in Fig. 7.9 is a closed contour, which we denote by S$ and f(z) is analytic 
within it. Then, we have 

¢ f(z)dz =0. 
s 


Now let the lines AB and F'E approach infinitely close to one another. The 
contribution from the part BDE tends toward the integral around C' in the 
positive (i.e., counterclockwise) direction. Similarly, the contribution from 
FGA tends toward that around C’ in the negative (clockwise) direction, thus 
minus that around C’ in the positive direction. The contributions from AB 


Fig. 7.9. Closed contour of ABDEFGA that consists of C and C’ 


7.2 Complex Integrations 201 


and EF approach equal and opposite values since they ultimately become the 
same path described in opposite directions. We thus come to the conclusion 


that 
§ fade =p flede 
(e G 


This means that, if a function is analytic between two contours, its integrals 
around both contours have the same value. 


Remark. There is an immediate extension to the case where C’ encloses several 
closed paths C),C2,---, all external to one another. Because of Cauchy’s 
theorem, an integration contour can be moved across any region of the complex 
plane over which the integrand is analytic without changing the value of the 
integral. It cannot be moved across a hole (the shaded area) or a singularity 
(the dot), but it can be made to collapse around either, as shown in Fig. 7.10. 
As a result, an integration contour C enclosing n holes or singularities can 
be replaced by n separated closed contours C;, each enclosing a hole or a 


singularity as given by 
fade = Sp flede 
f » Ci 


Or jens 


Fig. 7.10. Collapse of an integration path onto the boundaries of a hole (a large 
shaded region) and singularity (a small shaded dot) 


7.2.4 Primitive Functions 


Here is a definition of the primitive function of a complex function. 
@ Primitive function: 
Let f(z) A a function that is continuous in a domain D and has the 


property fo F( z)dz = 0 for every closed path C' in D. Then, the primitive 
function F(z) a f(z) is defined by 


n= [ f(z')dz’ (z,z€D), 


202 7 Complex Functions 


which is analytic in D with the derivative 


ailz)e 
FO) = F(2), 


Proof Consider the differential 


zt+Az Zz zt+Az 
F(z+Az)— F(z) -| (ede! — f flelyae! = f f(2)dz’, (7.24) 


where we make use of the path-independence property. If we write 


i f(z \dz = f(z) se sear f Ue) f(a’ 


ztAz 
>for / [F(2’) — f(@)]ae’, 


then (7.24) becomes 


zt+Az 
F(z+ Az) — F(z) -— f(2jAz = i f(z’) — f(z)]dz’. (7.25) 
Since f(z) is continuous, corresponding to an arbitrary small positive number 
€, there is a number 6 such that 
jz—2'|<6 => |f(z)-f(z)| <e. 


Now choose |Az| < 6, which ensures |z — z’| < 6 for z’ on the path C in 
question. Therefore, we have 


z+lz 2z+Az 
i] [F(2") — faz’, | < / f(2") — f(2)| ld2!| < el 


and (7.25) can be written as 


F(z+ Az) — F(z) 
Aare #2) 


<e for |Az| <6. 


Since € can be arbitrarily small, we conclude that 


oo AEH Ae) Hey 
APs Az = £2), 


or equivalently, 
dF (z) 


This result is obtained for any point in D, so F(z) is analyticin D. & 


7.2 Complex Integrations 203 


Fig. 7.11. Integration paths used in Exercise 1 


Exercises 


1. Evaluate the integral 
I(C) =a sin zdz (7.26) 
Cc 
along the two paths shown in Fig. 7.11: (a) C; = OB, (b) C2 = OA+AB. 


Solution: Since sinz = sin(x + iy) = coshysin x + isinh y cos « 
and dz = dx + idy, we can divide (7.26) into real and imaginary 
parts as 


I(C) = | (cosh y sin dx — sinh y cos dy) 
e. 


+ (cosh y sin ady + sinh y cos xdz). 
C 
Noting x = y along the curve C1, we have 
(Cy) = (1+ 7) i: cosh x sin xdx — (1 — 4) [ sinh x cos xdx 
0 0 
= [cosh x cos alo + [sinh x sin a]) 


= (1 —cosh1cos1) + i(sinh 1 sin 1), (7.27) 


where we employ partial integrations. Next we evaluate J along 
Cy. Along the path from O to A, x = 0 and dx = 0, and along the 
path from A to B, y=1 and dy = 0. Therefore, 


I(C2) =) sin zdz 
C2 


1 1 1 
== i, sinh ydy + a cosh x sin xdx + 7% | sinh x cos xdx 
0 0 0 


= (1 -—cosh1cos 1) +7(sinh1sin1). 


204 7 Complex Functions 


Observe that I(C,) = I(C2). & 
2. Set C: |z| = 1, and calculate the following integrals: 


gS]. wf ay f 
Of c lal ON 


Solution: Let z = re’’, which yields dz = ire’’d0 and |dz| = rdé. 
Hence, we have the results: (i) | $,dz/z| = De (ire®®) /(re*®)d0| = 
(ii) $,dz/|z| = Jat (ire’®)/rd0 = 0, (iii) $0 ae = {" 
ae ie =0. & 
3. Let f(z) be analytic on a unit circle D about the origin. For any two 
points z; and z2 on D, there exists two points €; and &2 on the line 
segment [z1, 22] that satisfy the relation 


f (22) — fla) = {Re [f"(&1)] + tlm [f"(&2)]} (22 - 21). (7.28) 


Prove it. (This is a generalization of the mean value theorem that is 
valid for real functions.) 


(i) 


Solution: From assumptions, we have 


f(z) - f(a) = i. Ne ds) | faAtie=e 


= (22-21) {f Re [f’ (21 + t(z2 — 21))] dt 


0 
+i f Im [f’ (21 + t(z2 — 21))] ar} é (7.29) 


Note that the integrals in the last line are both real. Hence, they 
satisfy the mean value theorem for integrals of real-valued func- 
tions g(t) that are expressed by 


1 
| glz1 + t(ze — 21) ]dt = g[z1 + c(z2 — 1)] when O<c< 1. 
0 
Setting €, = 21 + ce(z2 — 21) with 0 < cy < 1 (k = 1,2), we 
get the desired equation (7.28). & 
7.3 Cauchy Integral Formula and Related Theorem 


7.3.1 Cauchy Integral Formula 


We now turn to the famous integral formula that is the chief tool in the 
application of the theory of analytic functions in physics. 


7.3 Cauchy Integral Formula and Related Theorem 205 


@ Cauchy integral formula: 
If f(z) is analytic within and on a closed contour C, we have 


2nif(a) if a is interior to C, 
i = (@) (7.30) 


cz—a 0 if a is exterior to C. 


Proof The latter case is trivial; when z = a is exterior to C, the integrand in 
(7.30) becomes analytic within C so that we have at once ¢[f(z)/(z—a)|dz = 0 
by virtue of the Cauchy theorem. Hence, we consider below only the case where 
z= ais within C. 

Suppose that the integral 


ies 


cz#—a 


J(a) 


(7.31) 


around a closed contour C' within and on which f(z) is analytic. In view of 
the discussion in Sect. 7.2.3, the contour C may be deformed into a small 
circle of radius r about the point a. Accordingly, the variable z is expressed 
by z=a+re®,. 

Now, we rewrite (7.31) as 


J(a) = fla) oath MoT 4. (7.32) 


cz-a 


The first integral on the right-hand side becomes 


2n -i0 
ee i) Tt dO = 2ni. (7.33) 
0 


cz-a reo 


Hence, (7.30) is confirmed if the second integral of (7.32) vanishes for some 
choice of the radius r of the circle C. To show this, we note the continuity of 
f(z) at a, which tells us that for all ¢ > 0 there exists an appropriate quantity 
6 such that 


Je-al <6 > |fl@)-f@l<e. 


This implies that for any arbitrarily small ¢, we can find r = |z — a| that 
satisfies the relation 


f 22104, 
aE 


If) = f(@)| Beh 
< oa \dz| < 527 = 2re. (7.34) 


206 7 Complex Functions 


Thus by taking r small enough, but still greater than zero, the absolute 
value of the integral can be made smaller than any preassigned number. From 
(7.32 to 7.34), we obtain the desired equation: 


f 10 4p = 2nif(a) ifais withinC. & (7.35) 


Remark. If a is a point located just on the contour C, the integral (7.30) will 
have the principal value integral (see Sect. 9.4.1). 


The Cauchy integral formula gives us another hint by which to comprehend 
the rigid structure of analytic functions: If a function is analytic within and 
on a closed contour C, its value at every point inside C' is determined by its 
values on the bounding curve C. 


7.3.2 Goursat Formula 


A remarkable consequence of the Cauchy’s integral formula is the fact that, 
when f(z) is analytic at z = a, all of its derivatives are also analytic. Fur- 
thermore, the region of analyticity for those derivatives is identical with that 
of f(z). To prove the theorem, we use the integral representation (7.35) to 
evaluate the derivative, 


Qni f’(a) 
oer, S+h)— fla) _ | fl : : 
= oni im FOF AO) — tim |= $102) ( =z) 


h—0 
an GOs ng IOs, 
“a, bc a)(z—a Ae = $ oe (38) 


2 f(z) ‘ hML 
a ng EaueGeaon Ss b2(b — |hl)’ (7.37) 


where M is the maximum value of | f(z)| on the contour, L is the length of the 
contour, and 6 is the minimum value of |z—a| on the contour. The right-hand 
side of the inequality in (7.37) approaches zero as h — 0, so we have 


im f [al io 
h—=0 (z-a)(z-a—h) (z-a) 


5 | dz =0, 


which ensures the equality of the last part of (7.36). 


7.3 Cauchy Integral Formula and Related Theorem 207 


We can continue with the same process to obtain higher derivatives, ar- 
riving at the general formula for the nth derivative of f at z =a: 


@ Goursat formula: 
If f(z) is analytic within and on a closed contour C, we have 


f(a) = aie eu a (7.38) 


-= qe 


Note that equation (7.38) guarantees the existence of all the derivatives 
f’(a), f"(a),-++ and the analyticity at all a’s within C. 


Remark. The Goursat formula (7.38) is valid only within the contour, and 
thus gives no information as to the existence of the derivatives just on the 
contour. 


7.3.3 Absence of Extrema in Analytic Regions 


An additional noteworthy fact associated with Cauchy’s integral formula 
(7.30) is that it points up the absence of either a maximum or a minimum of 
an analytic function within a region of analyticity. 

For example, if z = a is a point within C, from (7.30) we see that 


1 27 


f(a) flat pe'®)d, (7.39) 


= ie F 


which means that f(a) is the arithmetic average of the values of f(z) on any 
circle centered at a. We thus have |f(a)| < M, where M is the maximum 
value of | f| just on the circle. (Equality can occur only if f is constant on the 
contour.) 

The above argument applies to arbitrary points within the circle and, 
further, to a region bounded by any contour C' (not necessary a circle). We 
thus conclude that the inequality | f(z)| <M holds for all z within C, which 
means that |f(z)| has no maximum within the region of analyticity. 

Similarly, if f(z) has no zero within C, then 1/f(z) is an analytic function 
inside C' and |1/f(z)| has no maximum within C, taking its maximum value 
on C. Therefore | f(z)| does not have a minimum within C but does have one 
on the contour C. We thus arrive at the following important theorem. 


@ Absolute maximum theorem: If a nonconstant function f(z) is ana- 
lytic within and on a closed contour C, then |f(z)| can have no maximum 
within C. 


208 7 Complex Functions 


@ Absolute minimum theorem: 
If a nonconstant function f(z) is analytic within and on a closed contour 
C, and if f(z) £0 there, then |f(z)| can have no minimum within C. 


Accordingly, points at which df/dz = 0 are saddle points, rather than true 
maxima or minima. 

We further observe that the theorems apply not only to |f(z)| but also to 
the real and imaginary parts of an analytic function. To see this, we rewrite 
(7.39) as 


20 20 
Qn f(a) =27(Ua +iv,) and 22rf(a)= f(a +ty)dd = i (u + iv)dd, 
0 0 


(7.40) 
where ug and vq are the values of u(z,y) and u(z,y) at z = x+iy =a. 
Equating the last terms of the two equations in (7.40), we obtain 

1 20 1 20 
Ua = — ud@ and vg = x | vdd, 
27 Jo 27 Jo 


so that wu, and v, are the arithmetic averages of the values of u(x, y) and 
v(x, y), respectively, on the boundary of the circle. Hence, based on the same 
reasoning as above, we see that both of u and v take on their minimum and 
maximum values on the boundary curve of a region within which f is analytic. 


7.3.4 Liouville Theorem 


We saw in the previous discussion that |f(z)| has its maximum M on the 
boundary of the region of analyticity of f(z). In certain cases, the maximum 
of | f(z)| bounds the absolute value of derivatives |f‘")(z)|, as stated in the 
theorem below. 


@ Cauchy inequality: 
If f(z) is analytic within and on a circle C with a radius r, and M(r) is 
the maximum of | f(z)| on C, then we have 


| 
|F™(2)| a —M(r) within and on C. 


This is called the Cauchy inequality. 


Proof Goursat’s formula reads 


f™ (z0) = =f f(z) oe 


271 


7.3 Cauchy Integral Formula and Related Theorem 209 
Take |z — zo| = r and use the Darboux inequality to get the desired result: 


n! 


n FACS] n! 
f° '(z0)| < Ori e— a) I S 57 ng M (1): 2ar 
= a & (7.41) 


If the f(z) we have considered is analytic at all points on the complex plane, 
ie., if it is an entire function, the above result reduces to the following 
theorem: 


@ Liouville theorem: 


If f(z) is an entire function and |f(z)| is bounded for all values of z, 
then f(z) is a constant. 


Proof Let n = 1 and M(r) = M in (7.41) to obtain 


If" (20)| < 
, 


Since f(z) is an entire functions we may take r as large as we like. Thus we can 
make | f’(zo)| < ¢ for any preassigned ¢. That is, |f’(zo)| = 0, which implies 
that f’(zo) = 0 for all zo, so f(zo) = const. & 


Liouville’s theorem is a very powerful statement about analytic functions over 
the complex plane. In fact, if we restrict our attention to the real axis, then it 
becomes possible to find many real functions that are entire and bounded but 
are not constant; cosz and e~* are cases in point. In contrast, there is no 
such freedom for complex analytic functions; any analytic function is either 
not bounded (goes to infinity somewhere on the complex plane) or not entire 
(is not analytic at some points of the complex plane). 


7.3.5 Fundamental Theorem of Algebra 


The next theorem follows easily from Liouville’s theorem and provides a re- 
markable tie-up between analysis and algebra. In what follows, the points z 
at which f(z) = 0 are called the zeros of f(z) or roots of f(z). 


@ Fundamental theorem of algebra: 
Every nonconstant polynomial of degree n with complex coefficients has 
n zeros in the complex plane. 


210 7 Complex Functions 


Proof Let P(z) be any polynomial. If P(z) # 0 for all z, then the function 
f(z) = 1/P(z) is entire. Moreover, if P is nonconstant, then P — oo as 
z— oo so that f is bounded. Hence, in view of Liouville’s theorem, f must be 
a constant. This result means that P is also a constant, which is contrary to 
our assumption that P is a nonconstant polynomial. We thus conclude that 
P(z) has at least one zero in the complex plane. 

Furthermore, an induction argument shows that an nth-degree polynomial 
has n zeros (counting multiplicity; see Remark 1 below). If we assume that 
every kth-degree polynomial can be written 


Py(z) = A(z — a1) +++ (2 — ar), 
it follows that 
Proi(z) = A(z —a0)(z2-—a1)-+:(z2-— ag). & 


Remark. 


1. The point a is called a zero of order & (or zero of multiplicity k) of 
the function P(z) if it reads 


P(z) = (z—a)*Q(z), 


where Q(z) is a polynomial with Q(a) 4 0. Equivalently, a is a zero of 
order k: if 


P(a) = P'(a) =--- = P®-Y(a) =0 and P)(a) £0. 
2. It can be shown that if f(z) and fo(z) are analytic within and on C' and 
if | fo(z)| < |fi(z)| 40 on C, then fi(z) and fi(z) + fo(z) have the same 


number of zeros within C’. This is called Rouché’s theorem, which is 
verified in Sect. 9.3.4. 


7.3.6 Morera Theorem 


The final important theorem is called Morera’s theorem and, is in a sense the 
converse of Cauchy’s theorem. 


@ Morera theorem: 
Let f(z) be a continuous function on some domain D and suppose that 


f(z)dz =0 


for every simple closed curve C in D whose interior also lies in D. Then f 
is analytic in D. 


7.3 Cauchy Integral Formula and Related Theorem 211 


Proof For some fixed point z in D, define the function 
F(z) = i f(z')dz’, z€D, 
20 


where the path is along the line segment in D from z to z. From this, we 

have 

Pes As) — Fe) _ fe) f°" aes i / a 
Bf eta] Ue) fee’ =se. 


where Darboux’s inequality is used in the second term in the limit Az — 0. 
As a result, we get 


F'(z) = f(z), 
which indicates the existence of the first derivative of F(z), so F(z) is analytic 
in D and f(z) is also analytic. & 


Exercises 
1. Let f(z) be analytic within a circle D : z = |R|, and let it satisfy the 
relations | f(z)| < M and f(0) =0. 
(i) Prove that 
|f(z)| < SH for z € D. (7.42) 
(ii) Prove that the equality in (7.42) holds at z = zo if and only if there 


exists a complex number c that yields |c| = 1 and 


M 
f(zo) = Cpe (7.43) 
Statements (i) and (ii) constitute the Schwarz lemma. 
Solution: (i) Equation in (7.42) holds trivially for z = 0. For 
considering the case of z 4 0, we specify the circle D’: |z| =p<R 
and set the function g(z) = f(z)/z. Since g is analytic within and 
on D’, it follows from the theorem in Sect. 7.3.3 that 


which means that 
M 
| f(z)| < —|z| for z € D’. 
p 
By fixing z within D’ and taking the limit of p to R, we get to 
(7.42). 


(ii) If the equality in (7.42) holds at some zo € D except at the 
origin, we have 


212 7 Complex Functions 


|9(zo)| = > |g(z)| for z € D. 


It follows again from the theorem in Sect. 7.3.3 that g(z) must be 
constant within D. Hence, we have 


g(z) = CR with |c| = 1. 


This reduces to the desired result (7.43). & 


2. Let f(z) be analytic on a domain D and f(z) 4 0. Show that if f(a) =0 
with a € D, then it is always possible to find small p > 0 such that 


0<|z-al<p => f(z) #0. 
This means that zeros of f(z) are necessarily isolated from each other. 


Solution: Suppose that z = a is an nth zero of f(z). From the 
definition of zero of a complex function, there exists an n € N 
such that 


p<n = f™(a) 40 and f(a) =0. 


Hence, the Taylor series of f(z) around z = a reads 


2 (n+p) (4 
Fe) = Os — ayn? = (2 - a)" ol), 


ao (utp)! 
where 
22 p(ntp)(a) pO(a 
in(z) = (2 - a?, 80 gna) = 40. 
> Garp) i 


Since gn(z) is analytic at a, it is continuous there. Thus we can 
find p > 0 such that 


ON 


1 
l2—al<p > |ga(2)—gnl(a)| <5 


It follows from the triangular inequality that 


Ifa] 1/F@| 
2 72 


nl n! 


|9n(z)| > |gn(@)| > 0. 
This implies that for our choice of p, 


0<|z-al<p > f(z) =(z-a)"gn(z) 40. & 


7.4 Series Representations 213 


3. Obtain an alternative form of Cauchy’s integral formula expressed by 
R?2 2 2 2a f(re’®) 
2n = Jy = R?- — 2r Reos(6 — b) +r? 


that is valid for |z| < R if f(z) is analytic for |z| < R. This is called 
Poisson’s integral formula. 


f(2) = F(re”*) = db 


* 


Solution: Consider the function 
Zz 


iO) = Rael): 


which is analytic for |¢| < R. Hence, for the contour C': |¢| = R, 
we have ¢,,9(¢)d¢ = 0. Furthermore, Cauchy’s integral formula 
tells us that $, f(¢)/(¢ — z)d¢ = 0. From these two results, we 
obtain 


1 1 2 _ R= |2/? f(Q) - 
Qri ft (+: + Roe :) Hoja Qri fb CHEF" : 
7.44 


Setting z = re’? and ¢ = Re’, we have 
(¢ — z)(R? — z*¢) = (Re”® — re'®) (R? — re“? Re’”) 
= Re’ [R? — 2rRceos(6 — ¢) +r]. 


Substituting in (7.44), we arrive at the desired formula. d& 


7.4 Series Representations 


7.4.1 Circle of Convergence 


We now turn to a very important notion: series representations of complex 
analytic functions. To begin with, we note (without proof) that most of the 
definitions and theorems in connection with the convergence of series of real 
numbers and real functions presented in Chap. 2 and 3 can be applied to 
complex counterparts with little or no change. Here we give a basic theo- 
rem regarding the convergence property of infinite power series consisting of 
complex numbers. 


@ Theorem: 
If the power series 


s One (7.45) 
=O) 


converges at z = zo # 0, then it converges absolutely at every point of 
|z| < |zo| and, furthermore, it converges uniformly for |z| < p where 0 < 
p< |Zol- 


214 7 Complex Functions 


Proof We first prove the statement regarding absolute convergence. From hy- 
pothesis, we see that the series >, anz{) converges. We set 


n 

k 

Sn = y akZ, 
k=0 


to obtain 
[Sn — Sn—1| = |@nz9| 70 (n> oo). 


Hence, there exists an integer M > 0 that satisfies 
lanzo| < M for all n, 


which implies 


Co Co Zz Co 
Ylens"|= So owes [=| <b> | | 
n=0 n=0 n=0 
Therefore, if |z| < |zo|, the right-hand side converges so that the series (7.45) 
converges absolutely. 

Next we consider uniform convergence. For every z satisfying the relation 
|z| < p < |zo|, we have 


az 
20 


lo e) CO n 
Yd lanz"|< MO A, 
n=0 n=0 |20| 


since 0 < p/|Zo| < 1. In view of the Weierstrass M-test, we conclude that the 
series (7.45) converges uniformly on the region of |z| <p. & 


This theorem states that converging behavior of power series 
lo e) 
Sa” (7.46) 
n=0 


can be classified into the following three types: 


1. It converges at all z. 

2. It converges (ordinary and thus absolutely) at |z| < R, but diverges at 
|z| > R, in which the real constant R depends on the feature of the series. 

3. It diverges at all z except the origin. 


This classification leads us to introduce the concept of radius of conver- 
gence RF of the power series (7.46). For the above three cases, it becomes 


1.R=0, 2. Ritself, 3. R= oo, 


respectively. The circle C with the radius R about the origin is called the 
circle of convergence associated with the series. Note that just on C, con- 
verging behavior of the corresponding series is inconclusive—it may or may 
not converge. 


7.4 Series Representations 215 


The following theorems provide us with a clue for finding the radius of 
convergence of a given power series. 


@ Theorems: 
Given a power series }>*° , an2", its radius of convergence R equals 


(i) R= lim 


n— oo 


An 


, if the limit exists; 


An+1 
1 


limsupy oo Van] 


(ii) R= 


7.4.2 Singularity on the Radius of Convergence 


Given a complex-valued power series, the convergence criterion based on the 
radius of convergence discussed in the previous subsection does not provide 
us with any information about the convergence property of the series just on 
the circle of convergence. We present below two important theorems regarding 
the latter point. 


@ Theorem: 
If the power series )7*° 4 G@nz” has a radius of convergence R, then it 
has at least one singularity on the circle |z| = R. 


Proof Set 
7(Z)= se AnZ”. 
n=0 


If f(z) were analytic at every point on the circle of convergence, then for each 
z with |z| = R, there would exist some maximal ¢,, such that f(z) could be 
continued analytically to a circular region |z—zo| < €z) where Zo is located 
on the circle |z| = R. (See Sect. 8.3 for details of analytic continuation.) 
Here €,, would depend on zg and we define 
€= min ¢,, >0. 
|zol=R 

By performing continuations successfully for all possible zo, we obtain a func- 
tion g(z) that is analytic for |z| < R-+¢. Clearly for |z| < R, g must be 
identical to f. In addition, g must have a power series representation, 


1) a ere (7.47) 


that is convergent for |z| << R+e. Yet since for |z| << R 


216 7 Complex Functions 
lo) 
(2) = f(2) = So anz”, 
n=0 
we conclude that 


This implies that the radius of convergence of (7.47) would be R, which clearly 
gives us a contradiction. We thus conclude that f(z) has at least one singu- 
larity on the circle |z| = R. & 


In general, it is difficult to determine when a function has a singularity at a 
particular point on the circle of convergence of its power series. The following 
theorem is one of the few results we have in this direction. 


@ Theorem: 

Suppose that a power series Neen Gnz” has a radius of convergence 
R < o and that a, > 0 for all n. Then the series has a singularity at 
z = R on the real axis. 


Proof By the previous theorem, the function 


f2) = Soanz” 
n=0 


has a singularity at some point Re’®. If we consider the power series for f 
about a point pe’* with 0 < p < R, we have 


oS : Oe (n) ia aes 
f@=5>b, (2- pet) = Sof 7 ) (z — pe’) , 
n=0 n=0 


where the radius of convergence is R — p. (If it were larger, the power series 
would define an analytic continuation of f beyond Re’®.) Note, however, 
that for any nonnegative integer j, the derivative f) reads 


FO(pei) = J n(n — 1) +++ (n= J+ Van(per)"-9. 
n=j 
Since a, > 0, we have 
JF (pel)| < F(0). 


This implies that the power series representation of f around, z = p, expressed 
by 


oo £(n) 
n=0 : 


n 


7.4 Series Representations 217 


must have a radius of convergence R—. On the other hand, if f were analytic 
at z = R, the above power series would converge on a disc of radius greater 
than R—p. Therefore, f is singular atz=R. d& 


7.4.3 Taylor Series 


Below is the one of the main theorems of this section, which states that any 
analytic function can be expanded into a power series around its analytic 
point. 


@ Taylor series expansion: 
If f(z) is analytic within and on the circle C of radius r around z = a, 
then there exists a unique and uniformly convergent series in powers of 


ge a), 


f(z) =o ex(z—a)* (\z-al <r), (7.48) 


with 


Ck 


nae £@Q 
Seay | eee af 


The largest circle C for which the power series (7.48) converges is called the 
circle of convergence of the power series and its radius is called the radius 
of convergence. 


Proof Let f(z) be analytic within and on a closed contour C. From Cauchy’s 
integral formula, we have 


fat Wee ¢ F(2) -d2, (7.49) 


2Qnt Jo Z-a— 


where a is inside a contour C’. The contour is taken to be a circle about a, 
inasmuch as the region of convergence of the resulting series is circular. We 
employ the identity 


co h? a ae z-a—h\ | he 
—a_ (z-a)? (z—a)N-1 zZ-4a i (z—a)N 
to obtain the exact expression 
ts S md hn 
Ca a (er Ne) (z-a—h)(z—-a)N" 
Substituting this into (7.49), we have 
N-1 
hn f(2) hn f(2) 
h)= d dz. (7.50 
Hash) Daa co (z—a)rtt +55 Gara eM) 


218 7 Complex Functions 


Since the first integral can be replaced by the nth derivative of f at z = a, 


we have 
N-lin 


h n 
fath= >> ar (a) + Ry, (7.51) 
n=0 
where, Ry is the second term on the right-hand side of (7.50). It follows from 
(7.51) that if limy—.Ry = 0, the Taylor series expansion of f(z) around 
z = a is obtained successfully. This is indeed the case. As f(z) is analytic 
within and on the contour C, the absolute value of Ry is bounded as 


aa f@ |e [AlN Me 
(=a) 


Qn N(z—a—h) | ~ rN(r—|Al)’ 


|Rn| = (7.52) 


where r is the radius of the circle and M is the maximum value of |f| on the 
contour. Within the radius r, |h| < 7 so that 


sim y= 0. 
Hence, we have 
[oe hn . 
Fath) =>) F(a), (7.53) 
n=0 


which holds at any point z = a+h within the circle of radius r. & 


We note that the series (7.53) converges for large h as far as |h| < re, since 
Ry vanishes as N — oo for any value of |h| smaller than r,. Furthermore, as 
the inequality (7.52) holds whenever f(z) is analytic within and on the circle 
of radius r., the radius of convergence, r, can extend up to the singularity 
is nearest neighbor to z = a. When the extending circle goes beyond the 
nearest singular point, the inequality becomes invalid so that the Taylor series 
expansion fails. 


7.4.4 Apparent Paradoxes 


We have seen that the radius of convergence is determined by the distance to 
the nearest singularity. Interestingly, this explains some apparent paradoxes 
that which occur if we restrict our attention only to values of the series along 
the real axis of z. 
A familiar example is the Taylor expansion of f(z) = 1/(1—z) around the 
origin: ; 
2 
joo tt ee fore, (7.54) 
Obviously, both sides of (7.54) “blow up” at z = 1. At z = —1, on the other 
hand, the right-hand side diverges, whereas the left-hand side has a finite 
value of 1/2. Notably, this apparent paradox occurs at all points represented 


7.4 Series Representations 219 


by z = e’?, ie., at any point on a unit circle surrounding the origin. The 
reason for this is clear from the point of view of the radius of convergence. 
(We leave it to the reader.) 
Another example is 
f2ya el. 

Observe that f((0) = 0 for any n = 0,1,---, so if one puts this result 
blindly into the Taylor formula around z = 0, one obtains apparent nonsense 
as e 1/2" = 0. The point here is that z = 0 is a singularity, where the Taylor 
series expansion is prohibited. 

These two examples suggest the importance of realizing the difference be- 
tween the series representing a function and “the function itself.” A power 
series, such as a Taylor series, has only a limited range of representation char- 
acterized by the radius of convergence. Beyond this range, the power series 


is unable to represent the function. For example, the function considered in 


(7.54), 
fe)= a. 


exists and is analytic everywhere except at z = 1, but its power series around 
z = 0, given by 


(7.55) 


Ltz+27+-- 


exists and represents f only within the unit circle centered at the origin (i.e., 
|z| < 1). The region in which a power series reproduces its original function is 
dependent on the explicit form of the series expansion. In fact, an alternative 
series expansion of (7.55) around z = 3 is given by 


which exists and represents (7.55) only within the circle of radius 2 centered 
at z = 3. We thus conclude that power series (including Taylor’s, Laurent’s, 
and others) are not regarded as pieces of a versatile mold by means of which 
one can cast a copy of the function. Each piece of the mold can reproduce the 
behavior of f only within the region where the series converges, but gives no 
indication of the shape of f beyond its range. 


7.4.5 Laurent Series 


When expanding a function f(z) around its singular point z = a, Taylor’s 
expansion is obviously not suitable but we can obtain an alternative expansion 
that is valid for a singular point. The latter kind of expansion is called a 
Laurent series expansion. Laurent series enter quite often in mathematical 
analyses of physical problems, where functions to be considered have a finite 
number of singularities. 


220 7 Complex Functions 


@ Laurent series expansions: 
Let f(z) be analytic within and on a closed contour C' except at a point 
z =a enclosed by C. Then, f(z) can be expanded around z = a as 


CO 


ie) = a Cn(z— a)”, (7.56) 


n=—CcoO 


with the definition 


1 A) 
Cn t Coat (7.57) 


Oni 


The series (7.56) with the constants (7.57) is called the Laurent series 
expansion of f(z). 


Fig. 7.12. Conversion of a closed contour C into Ci + C2 so as not to involve the 
singularity of f(z) at z =a in it 


Proof The trick to deriving a, Laurent series expansion is to use the contour 
C; + Cy illustrated in Fig. 7.12 such that its interior does not contain the 
singular point of f(z) at z =a (ie., f is analytic within and on the contour). 
As is indicated, the original contour C' can be reduced to two circular con- 
tours C and C2 encircling z = a counterclockwise and clockwise, respectively. 
Applying Cauchy’s theorem, we have 


f(a+h) = ah Nag, 


271 


2 = Pe): 5, = ¢ P@) ay (7.58) 


2ri Jo, z-a—h 2T% 


7.4 Series Representations 221 


Note that |z — a| > |h| on the contour C; and |z — al < |A| on Co. We thus 
have 


1 1 1 i h \" 
ee a oe ie - (5) on C, (7.59) 
and 


1 1 1 aL gaa \" 
me Sa all 5 ) on Co. (7.60) 


The substitution of these two expressions into (7.58) yields 


f(a+h) = =— ao ried +f Ve (za)? yr je 


2n=1 
(7.61) 
The order of integration and summation within the square brackets can be 
reversed since the infinite series involved in the integrals converge. Eventually, 
we obtain 


= 1 f(z) 
n, — 

flath)= De: Cyh”; Cy = a ¢ ga)" dz. (7.62) 
Here, the contour for the coefficients c, should be C; in the positive direction 
for n > 0 and C, in the negative direction for n < 0. The series (7.62) is 
what we call the Laurent series expansion of f(z) around the singular point 
z =a. Note that C; can be taken as the contour for all values of n with the 
reverse direction for negative n’s. This is because the integrand is analytic 
in the region between C; and C3, which allows us to expand the size of the 
contour C2 until it coincides with the larger contour Ci. & 


7.4.6 Regular and Principal Parts 


An important property of Laurent series is the series resolution. To see this, 
we rewrite (7.62) as follows: 


flath)= Ye onl +e Ape (7.63) 


The first term in (7.63) converges everywhere within the outer circle of conver- 
gences, whereas the second term converges anywhere outside the inner circle. 
This means that the Laurent series expansion resolves the original function 
f(z) into two parts: one that is analytic within the outer circle of conver- 
gence, and the other that is analytic outside the inner circle of convergence. 
Obviously, each part is analytic over different portions of the complex plane. 


222 7 Complex Functions 


The part of the Laurent series consisting of positive powers of h is called 
the regular part. The other part, consisting of negative powers, is called the 
principal part. Either part (or both) may terminate at a finite degree of the 
sum or be identically zero. Particularly when the principal part is identically 
zero, then f(z) is analytic at z = a, and the Laurent series is identical with 
the Taylor series. 


Remark. At first glance, the regular part exhibited in (7.63) resembles the 
Taylor series. However, this is not the case; the nth coefficient cannot gen- 
erally be associated with f‘")(a) because the latter may not exist. In most 
applications, f(z) is not analytic at z = a. 


7.4.7 Uniqueness of Laurent Series 


Taylor and Laurent series allow us to express an analytic function as a power 
series. For a Taylor series of f(z), the expansion is routine because the coef- 
ficient of its n term is simply f)(zo)/n!, where zp is the center of the circle 
of convergence. In contrast, for the case of a Laurent series expansion, the 
nth coefficient is not (in general) easy to evaluate. It can usually be found by 
inspection and certain manipulations of other known series, but if we use such 
an intuitive approach to determine the coefficients, we cannot be sure that 
the result we obtain is correct. The following theorem addresses this issue. 


@ Theorem: 


If the series 


Co 


Se An (z — 20)” (7.64) 


n=—Co 


converges to f(z) at all points in some annular region around zo, then it is 
the unique Laurent series expansion of f(z) in that region. 


Proof Multiply both sides of (7.64) by 


1 
Qri(z — zy) k+l? 


integrate the result along a contour C in the annular region, and use the easily 
verifiable fact that 


to obtain 


7.4 Series Representations 223 


Thus, the coefficient a, in the power series (7.64) is precisely the coefficient in 
the Laurent series of f(z) given in (7.57), and the two must be identical. & 


Remark. A Laurent series is unique only for a specified annulus. In general, a 
function f(z) can possess two or more entirely different Laurent series about 
a given point, valid for different (nonoverlapping) regions; For instance, 


L 
1 H41tzte7ges, O< lz] <1, 
z 
Hel Saag) tod a 
pg as 1<|z|< om. 


7.4.8 Techniques for Laurent Expansion 


The following examples illustrate several useful techniques for the construction 
of Taylor and Laurent series. 


(a) Use of geometric series 


Suppose that a function 
1 
f(z) = (7.65) 


z—a 


fails to be analytic at z = a. We would like to obtain the Laurent series of 
f(z) around z = a. First we note that for |z| < |a|, f(z) reads 


a cima nee (7.66) 


This is obviously the Taylor series expansion of f(z) around the point z = 0. 
That is, for |z| < |a|, the Laurent series of f(z) given in (7.65) becomes 
identical to its Taylor series. Nevertheless this is not the case for |z| > |al, 
since its radius of convergence is R = |a|. Hence, we should also evaluate the 
Laurent series around z = a that is valid for |z| > |a|. In a similar manner as 
above, we obtain 


1 Lie GN ae ie 
—-=-) (5) =p for [1 > Mal. (7.67) 


n=0 n=0 


Expansions (7.66) and (7.67) both serve as the Laurent series expansions of 
f(z), although the regions of convergence are different from one 
another. 


224 7 Complex Functions 


Remark. The function f(z) given in (7.65) can be expanded by this method 
about any point z = b; Indeed, write 


iQ=> = CCD (b#a). 


Then, either 


f(z) = ———— (\z— | < |a—d) 


or 


fle) = SSO (le > laa). 


(b) Rational fraction decomposition 


Next we assume a function 


1 


Ha) = 22 —(2+i)z24+27 


The roots of the denominator are z = 7 and z = 2, which are the only points 
at which f(z) fails to be analytic. Hence, f(z) has a Taylor series about z = 0 
that is valid for |z| < 1 and two Laurent series about z = 0 that are valid for 
1 < |z| < 2 and |z| > 2. To obtain them, we use the identities 


z* — (2+4)2+ 21 = (z—4)(z - 2) 


and 


1 1 1 1 
Leno) 8 (5 2 2 :): 
When we want the Laurent series of f(z) around z = 0 that is valid for 
1 < |z| < 2, it suffices to expand the function 1/(z — 2) in the Taylor series 
about z = 0 [see (a) above] and then expand 1/(z — 7) in the Laurent series 
about z = 0 that is valid for |z| > 1. (The latter series is also valid for 
1 < |z| < 2.) If these two series are subtracted, we obtain a series for f(z) 
that is valid for 1 < |z| < 2, which is the desired Laurent series. 


(c) Differentiation 


The method used in (b) fails for functions with a double root in the denomi- 
nator such that 


7.4 Series Representations 225 


Among alternative methods, the simplest one is the differentiation 


cop (4). 


From the discussions regarding the earlier case (a), the function 1/(1 — z) is 
seen to be represented by 


ee |z| <1, 

1 = n=0 

1-z eo 
=D sae, eee 
n=0 


Hence, term-by-term differentiations yield 


CO 


S(neD2", |e <1, 


= n=0 


—So(n 41), |el>1. 
n=0 


Exercises 


1. Let f(z) be an entire function. Employ the Taylor series expansion to 
show that the function defined by 


f(z) = fla) 
GOS ee 
f'(a), z=a 


is also entire. 


Solution: For z 4 a, we employ the Taylor series expansion of 
f(z) to obtain 


FO) 


_ f’(a) 
3! 


2! 


(2) = f(a) (2-0) + (za)? +--+. (7.68) 
By the definition of g, the representation (7.68) is valid at z = a. 
Hence, g is equal to an everywhere-convergent power series and is 


thus an entire function. & 


2. If f is entire and if for some integer k > 0 there exist positive constants 
A and B such that 
f(z) < A+ Ble’, 


then f is a polynomial of degree & at most. Prove it 


226 


3. Find the Laurent series of the multivalued logarithmic function given by 


7 Complex Functions 


Solution: Note that the case k = 0 is the original Liouville 
theorem. To prove the case of k > 0, we employ mathematical 
induction, and consider 


z) — f(0 
wi={" z - #0, (7.69) 


where f(z) is assumed to obey the conditions noted above. By 
Exercise 1, g is entire. In addition, by hypothesis on f we have 


Ig] < C+ Dial**. 


Hence, by induction, g is a polynomial of degree k — 1 at most, 
then f is polynomial of degree k at most owing to the definition 
(7.69). This completes the proof. d& 


f(z) = log(1 + z) = log |1 + z| + targ(1 + z). 


Solution: The branch cut (see Sect. 8.2.3) is set so as to extend 
from —co to —1 along the real axis. Hence, log(1 + z) is analytic 
within the circle |z| = 1. Since 


d 1 
log(1 +z) = ; 
Fe An eae tee? 
we may expand 
Leip P ap ey Ci GS 
1+z ec 
Then, term-by-term integration yields 
z dé 2? oa 
= } aenehG 1), 
me eS ee Aes 


where C’ is the constant of integration. Since log 1 = 0, it follows 
that C= 0 and 


log(l+2)=2- S42 = ay (| <1). 


n=1 


Other branches of log(1 + z) have the same series except for dif- 
ferent values of the constant C. & 


7.4 Series Representations 227 


4, Find the power series representation of f(z) about z = 0 that satisfies the 
differential equation 


f'(2)+ f(z) =0 with f(0) =1. (7.70) 


Solution: Let f(z) = 1+ S072, anz”. Then we have f’(z) = 
ane 


= a, + O(n + 1)an412". Substitute this into (7.70) to obtain 
l+a,=0 and a,4+(n+1)an41 =0 for n> 1. 
The latter result yields 


1 2 1 a nt 
On = (“Vana = (1) yan = = (1) a. 


Hence, we have a, = (—1)"/n!, so that 


n 


5. Let f(z) = 72.9 en(z — a)” be analytic for |z — a] < R. Prove that 


1 27 : co 
es |f(a+ re!) | dd = SS len|?r?" for any r < R. 
m Jo n=0 
Then show that a 
ler <M, (7.71) 
n=0 


in which M(r) = maxj,_q\=, | f(z)|. The result (7.71) is called Gutzmer’s 
theorem. 


Solution: From assumption, it follows that 
IF@P = Se ninety 3 satreyn 2S aap 
n=0 m=0 n,m=0 


This infinite series converges uniformly on the circle |z — a| = 
r < R, which allows us to interchange the order of integration and 
summation as expressed by 


on & 20 
/ \f(a + re’) |?d0 = ‘> Cnet” i eiln—m)O gg. 
0 


n,m=0 0 


The right-hand side vanishes when n 4 m since the integral equals 
zero. Hence, we have 


228 7 Complex Functions 


20 oo 
| f(a + re®)|?do = SS len|?r?" x Qr, 
0 


n=0 


which is equivalent to the desired equation. Furthermore, since 
|f(a+re’)| < M(r), we have 


an 20 


co Q7 ' 
> lent = = f it(atre)2a9< 2. | M(r)2d0 = M(r)?. & 
=A 27 Jo 27 Jo 


7.5 Applications in Physics and Engineering 


7.5.1 Fluid Dynamics 


This section demonstrates the effectiveness of using complex function theory 
for analyzing fluid dynamics in a two-dimensional plane. The primary aim is 
to derive the Kutta—Joukowski theorem (see Sect. 7.5.2), which describes 
the lift force exerted on a solid material placed in a uniform flow. Before 
proceeding, we introduce terminologies and several basic concepts that pertain 
to fluid dynamics. 

The fundamental quantities that characterize a two-dimensional fluid flow 
are velocity v = ue, + vey and vorticity w = V x v, both of which are 
vector-valued functions of the position r. Here, we restrict our attention to 
the case of an irrotational (w = 0) and incompressible (V - v = 0) fluid. 
The assumption w = V x v allows us to define an appropriate function ®(2, y) 
such that 


v= Ve, (7.72) 
since V x (Vf) = 0 for any analytic function f(x,y) in the x-y plane. The 


function ®(x, y) defined by (7.72) is called the velocity potential. Further, 
our assumption of V - v = 0 implies that 


Ou 2 Ov 0 
Ox Oy 
which in turn suggests the presence of an analytic function (x, y) defined by 


Ow sO 
Uw= By’ CS a (7.73) 


that satisfies the two-dimensional Laplace equation V?W = 0. Such a function 
W(x, y) is called a stream function. 


7.5 Applications in Physics and Engineering 229 


Remark. The name stream function originates from the fact that the curves 
of W(a,y) = const. in the x-y plane represent streamline flow. This is shown 
by noting that if d¥ = 0, we have 


Ww Ow 
f= 2 dx + o> vdx + udy = 0, 


so that dx/u = dy/v, which implies that dr is parallel to v. 


From (7.72) to (7.73), it follows that the components of the velocity v are 
expressed as 
Ob Ow Of ow 
Oa Oy’ ie Oy Ox 
This allows us to introduce the concept of a complex velocity potential 
f(z) in the complex plane: 


f(z) = ®(z) +iW(z) with z=a+ ty. (7.74) 
Note that since f(z) is analytic, 
Of _ df 


Oy = de =u-w= jvje—*, 

i.e., the absolute value of the derivative |df/dz| gives the magnitude of the 
velocity |v|. Furthermore, the contour integral of f(z) has important physical 
implications. Given a closed contour C' placed on a two-dimensional flow, we 
have 


ff = 110) +iQ0), 


where 


PC) = f ad = § (ude + vdy) = f var 
Q(C) = f av = f (uty —vde) = $ | x ar), 


Hence, the integrals '(C) and Q(C) represent the circulation (or rotation) 
and the fluid flow, respectively. 


7.5.2 Kutta—Joukowski Theorem 


We are now ready to study the Kutta-Joukowski theorem, which describes 
the lift force in a two-dimensional flow. The lift force is a component of the 
fluid dynamic force that is perpendicular to the flow direction. It is the lift 
force that makes it possible for airplanes, helicopters, sail boats, etc. to move 
against the gravitational force or water currents. 


230 7 Complex Functions 


° 
* 
mo 


° 
co ° 


Fig. 7.13. Spatial configuration of material placed into a two-dimensional uniform 
flow with speed U showing the components F;,, Fy of the flow-induced force F acting 
on the material 


@ Kutta—Joukowski theorem: 
The lift force F, that acts on a material placed in a uniform flow U in 
the x-direction is given by 


F, = —pUT(C), (7.75) 


where p and I'(C) are the mass density and the circulation of the fluid, 
respectively, within a closed contour C surrounding the material (see 
Fig. 7.13). 


The lift force is generated in accordance with Bernoulli’s theorem and the 
law of conservation of momentum. Both of these principles are used to 
explain the mechanism responsible for the occurrence of the lift force in a 
uniform flow, which is given by the Blasius formula (see 7.5.3): 


P=2 6 wide, 
2 Jo 


which plays a key role in the proof of the Kutta—Joukowski theorem, as shown 
below. 


Proof (of the Kutta—Joukowski theorem). Assume a uniform flow oriented to 
the a-axis. Then the function w = df /dz is analytic and satisfies the relation 


lim w =U = const. 


20 


Hence, w can be expanded at points sufficiently far from the origin: 


(2 co), (7.76) 


7.5 Applications in Physics and Engineering 231 


which implies that 


fae eles te +o < fis (zo) (7.77) 
z z 
ue 2Uk 1 
we = U2 4+ + (ke — 201) a te 00), (7.78) 
From (7.77) we have 
¢ pes Dil = T(C) HOO. (7.79) 
C 


and substituting (7.78) into the Blasius formula expressed by F' = (ip/2) 
fo w'dz, we obtain 


| ee ee + - Imi QWky = —2rpUko. (7.80) 


Combining (7.79) and (7.80) yields 


PF, +iFy = pU(-Q +iI), 


F, =—pUQ, F, =—pUT. (7.81) 


of the two results above, it is the second one regarding Fy, that states the 
theorem. & 


Remark. The first equation in (7.81) indicates that F, = 0 if Q = 0; ie. no 
force in the direction of the stream is relevant to a material inside the closed 
contour C’ if no source is located interior to C’. This is precisely the case for 
an ideal flow without any viscosity. 


7.5.3 Blasius Formula 


We conclude this section by explaining the Blasius formula, which is impor- 
tant for the proof of the Kutta—Joukowski theorem discussed above. Consider 
a two-dimensional flow of irrotational and incompressible fluid and assume 
that a solid material is placed inside a closed contour C' encircling a portion 
of the fluid. Apparently, a force F' from the flow is exerted on the material. 
Hence, the law of the conservation of momentum within the contour C 


is written as 
F + dG = 0, 
Cc 


where dG represents the sum of momentums that pass through a line element 
ds of the closed contour C' per unit time. It is given by 


232 7 Complex Functions 
dG = pnds + pvunds, (7.82) 


where p is the fluid pressure, n is a basis vector normal to the contour C’, p 
is the density of the fluid, and v, = v-n. The first and second terms on the 
right-hand side of (7.82) represent the impulse transmitted to the interior of 
C through ds and the volume of fluid passing through ds, respectively. Using 
the stream potential Y, we rewrite as (7.82) 


dG = pnds + pudt, (7.83) 


since dW = v,ds. 
In order to obtain the complex-number representation of (7.83), we denote 
by dz an infinitesimal vector having length ds and a direction normal to n. 
We then have 
dz = i(n,z + iny)ds. 


when we apply this relation to (7.82), the quantity dG is expressed as 


us _ df — df” 


dGy +idGy = —ipdz + Pa F 
i 


(7.84) 


where we consider dW to be the imaginary part of df. The pressure p is known 
to correlate with f via Bernoulli’s theorem, which is expressed by 


p=Po- = (7.85) 


pat _ p af df” 
ie 2 dz dz*’ 


where po is the pressure at a position far from the material (i.e., z > oo). It 
then follows from (7.84) to (7.85) that 
sie oot _ tp df df* ip df" (af 1 OF a 
dG, + idGy = —ipodz 4 Sear dz rar dz aa dz” ) (7.86) 


P 2 
ipodz 5 (F ) dz. (7.87) 


I 


Since bo dz = 0, we finally obtain 


Par w -3¢ () ey (7.88) 


which is known as the Blasius formula. 


8 


Singularity and Continuation 


Abstract We devote the first half of this chapter to the essential properties and 
classification of singularities, which are nonanalytic points in a complex plane. We 
then describe analytic continuation, which is a most important concept from a the- 
oretical as well as an applied point of view. Through analytic continuations, we 
observe the interesting fact that the functional form of a complex function may 
undergo various changes depending on the defining region in the complex plane. 


8.1 Singularity 
8.1.1 Isolated Singularities 


A singularity of a complex function f(z) is any point where it is not analytic. 
In particular, the point z = a is called an isolated singularity if and only 
if f(z) is analytic in some neighborhood but not at z = a. Most singularities 
we have encountered so far in this text were isolated singularities. However, 
we will see later that there are singularities that are not isolated. 

When z = a is an isolated singularity of f(z), it is classified as follows: 


1. A removable singularity if and only if f(z) is finite throughout a neigh- 
borhood of z = a, except possibly at z = a itself. 

2. A pole of order m (m = 1,2,---) if and only if (z — a)™ f(z) but not 
(z — a)™1 f(z) is analytic at z = a. In this case, lim,..|f(z)| = oo no 
matter how z approaches z = a. 

3. An essential singularity if and only if the Laurent series of f(z) around 
z = a has an infinite number of terms involving negative powers of (z—a). 


Remark. There is an alternative definition of a pole: the point z = a is a pole 
of mth order of f(z) if and only if 1/f(z) is analytic and has a zero of order 
mat z=a. 


234 8 Singularity and Continuation 


The three types of isolated singularities described above can be distinguished 
by the degree of expansion of the Laurent series of f(z) being considered. Let 
f(z) have an isolated singularity at z = a. Then there is a real number 6 > 0 
such that f(z) is analytic for 0 < |z—a| < 6 but not for z = a, which means 
that f(z) can be represented by the Laurent series 


lee) M 
fle) = iene a)" + Yo ene. (8.1) 
n=0 n=0 


Thus, it suffices to examine the expansion degree M of the principal part, the 
second sum in (8.1), in order to determine the type of the isolated singularity 
z= a. 


Case 1. Removable singularities (IM = 0) 


In this case, the principal part is absent so that the Laurent series around 
z =a reads 


f(z) =cota(z—a)+e(z-—a)? +--- (24a). 


Observe that lim,_., f(z) = cg as is consistent with statement 1 above, which 
says that f(z) is finite in a neighborhood of z = a. This kind of singularity 
can be eliminated by redefining f(a) as co, which is why we call it removable. 


Examples Consider the function 


f= (8.2) 


This yields lim,_.9 f(z) = 1, but the value of f(0) is not defined. Hence, z = 0 
is a removable singularity of (8.2). In a similar sense, the functions 


sin z/z 


e 


are regarded as analytic at z = 0, since this point is the removable singularity 
for each. 


Case 2. Isolated poles (M is finite) 


The second type of isolated singularity, for which the principal part reads 


M 
Seah ea Ae oa), 
n=1 


is called a pole of order M. Order M is the minimum of the integer that 
makes the quantity 
lim (z — zo)” f(z) 


ZZ 


a finite, nonzero complex number. 


8.1 Singularity 235 


Examples 1. The function f(z) = 1/sinz has Laurent series valid for 0 < 
Jz] <7; 
ee ee EL ee 
snz z 6 360 15120 i 
from which it follows that it has a simple pole at the origin. 
2. The function f(z) = 1/z has a simple pole at z = 0, which is easily seen 


by noting that lim,.9 zf(z) = 1. 


Case 3: Essential singularities (IZ = oo) 


The third type of isolated singularity, essential singularity, gives rise to an 
infinite principal part. 


Examples The function f(z) = e!/* has the Laurent series 


1 1 
oa 
. al et OL ay gs 


eee, 


which is valid for |z| > 0. Since the principal part is infinite, the function has 
essential singularity at z = 0. 


Remark. An infinite principal part in the Laurent series implies essential 
singularity only when the series is valid for all points in a neighborhood |z — 
a| < € except z =a. For example, the series 


1 1 1 
1)= Gast Gant G1 


does not mean that z = 1 is an essential singularity of f(z), since the series 
converges only if |z—1| > 1. It actually represents the function f(z) = 1/(z?- 
3z + 2) in the annulus 1 < |z — 1| < R, which evidently has a simple pole at 
Zs. 


8.1.2 Nonisolated Singularities 


As noted earlier, there are other kinds of singular points that are neither 
poles nor essential singularities. For example, neither \/z nor log z can be 
expanded near z = 0 in Laurent series; both of them are discontinuous along 
an entire line (say, the negative real axis) so that the singular point z = 0 is 
not isolated. Singularities of this kind, called branch points, are discussed 
in the next subsection. 

Another type of singular behavior of an analytic function occurs when it 
possesses an infinite number of isolated singularities converging to some limit 
point. Consider, for instance, 


236 8 Singularity and Continuation 


1 
Ie) aa jay 


The denominator has simple zeros whenever 


Z2=— (n = +1, +2,---). 


The function f(z) has simple poles at these points and the sequence of these 
poles converges toward the origin. The origin cannot be regarded as an isolated 
singularity because every one of its neighborhoods contains at least one pole 
(actually an infinite number of poles). 


8.1.3 Weierstrass Theorem for Essential Singularities 


The behavior of a function in the neighborhood of an isolated essential sin- 
gularity is different from the cases of other isolated singularities such as poles 
and removable singularities. Most remarkable is the fact that f(z) can be 
made to take any arbitrary complex value by choosing an appropriate path 
of z — a. For instance, if z approaches zero along the negative real semiaxis, 
then the function f(z) = e!/* yields | f(z)| + 0. However, if z approaches zero 
along the positive real semiaxis, then |f(z)| — oo. Finally, if z approaches 
zero along the imaginary axis, then |f(z)| remains constant but arg f(z) os- 
cillates, and so on. The character of a function near an essential singularity is 
described by the following theorem: 


@ Weierstrass theorem: 
In any neighborhood of an isolated essential singularity, an analytic 
function approaches any given value arbitrarily closely. 


Proof We use the contraposition method to prove our theorem. Let z = a 
be an isolated essential singularity of f(z). We assume for the moment that 
for |z — al < e, |f(z) — y| with a given complex number 7 does not become 
arbitrarily small. Then, the function [f(z) — y]~! is bounded in the region of 
|z —a| < «so that it is possible to find a constant M such that 


< M for |z-al <e. 


Fal 


Hence, [f(z) — y|~/ is analytic for |z — a| < € (or at worst has a removable 
singularity) and can be expanded by 


= bo + bi(z — a) + bo(z— a)? +---. (8.3) 


8.1 Singularity 237 


If by £0, then 


1 1 
lim ——— = bo so that lim f(z) =y+—. 
Za f(z) _ Ey e Za Fi ) bi bo 


This means that z = a is not a singularity of f(z), which contradicts our 
assumption. Otherwise, if bp) = 0, we have 


1 
(z— a) [by + bega(z— a) +--*] 


f= 7+ 
where b; is the first nonzero coefficient in the series (8.3). This clearly shows 
that z = a is a pole of f(z) of kth degree, which again is inconsistent with 
our assumption. Therefore, we conclude that | f(z) — | with a given y can be 
arbitrarily small in the vicinity of an essential singularity z = a. Furthermore, 
since ¥ is arbitrary, the function f(z) approaches any given complex value 
arbitrarily closely. & 


Remark. The above theorem becomes invalid if the point at infinity is taken 
into account; the point at infinity z = oo is defined as the point Z that is 
mapped onto the origin z = 0 by the transformation 7 = 1/z. For instance, the 
function f(z) = e* has an essential singularity at z = co but never approaches 
zero there. 


8.1.4 Rational Functions 


In comparisons with the previous case, the behavior of an analytic function 
near a pole is easy to describe. We now derive the following result: 


@ Theorem: 

A rational function has no singularities other than poles. Conversely, 
an analytic function that has no singularities other than poles is neces- 
sarily a rational function. 


A rational function f(z) is of the form 


Hay = BE, aa 
q(2) 
where 
p(z) = a9 + az + agz? +++» + an2” 
and 


q(z) = Bo + Biz t+ Boz? +--+ + Bm2™. 


Observe that the polynomials p(z) and q(z) are analytic at all finite points on 
the complex plane. 


238 8 Singularity and Continuation 


Proof In what follows, we assume that p(z) and q(z) have no common zeros; 
if they do have a common zero at z = 20, it is always possible to write f(z) in 
(8.4) as the quotient of two polynomials with no common zeros by canceling 
a suitable number of the (z — zo)-factors. 

Obviously, the only possible singularities of f(z) are situated at the zeros 
of q(z). Since the zeros of p(z) do not coincide with those of q(z), f(z) neces- 
sarily diverges at the zeros of g(z). Such points can be poles but not essential 
singularities in view of the Weierstrass theorem given in Sect. 8.1.3. We have 
thus proved that all singularities of rational functions f(z) are necessarily 
poles. 

To prove the converse, suppose that all the singularities of an analytic 
function f(z) are poles at the points a1, a@2,--+ ,@n. The orders of these poles 
are denoted by m1,mz2,--- ,Mn, respectively. In the vicinity of the point a,, 
the function f(z) has a Laurent series expansion of the form 


eo) cf oo Bs 
a Ms = sans cece. oe v _ L 
fe) (z—a,)™ rey (z — a,) t ue (z— a)", 


where the superscripts (v) on c\”) indicate that they are the coefficients that 
belong to the vth poles, z = a,. Denote the principal part by 
oO 


z—a,)™ (z — ay) 


gz) = (8.5) 


and consider the expression 


h(z) = f(z) — gi(2) — ga(2) — +++ — Gn(2): 


Since f(z) — g,(z) is analytic at z = a,, and g,(z) is analytic everywhere 
except at z = ay, it follows that h(z) is analytic at all points of the complex 
plane, including the point at infinity. In view of Liouville’s theorem such a 
function is necessarily a constant. Thus we have identically h(z) = yo, whence 


f(z) =%0 +5 gz), (8.6) 


which implies that f(z) can be brought into the form (8.4). This completes 
the proof of our theorem. & 


Exercises 


1. Find the poles and their order of the following functions: 


_ sin(z +1) 
ea 


sin z 


, (b) f(z) = 


(a) f(z) 


zB 


8.1 Singularity 239 


Solution: (a) Clearly, lim,_,9 27 f(z) = 00 and lim, 49 22 f(z) = 
sin(1) 4 0. Hence, f has a third-order pole at z = 0 arising from 
the factor 1/z°. (b) Since lim,_.9 z3 f(z) = 0, the pole of f(z) is 
not a third-order pole. Instead, noting the asymptotic behavior of 
sin z near z = 0, we obtain 


: . 9 2a (23/3!) +--- 
el oe errr 
Hence, f(z) has a second-order pole at z=0. d& 
2. Show that a function f(z) cannot be bounded in the neighborhood of 
its isolated singular point z= a. 
Solution: Use the contraposition method; if |f(z)| < M for 
|z —a| <r, then the expansion coefficients read 


1 


eal = aaa 


AG _ a)"—* f (de < Mr” for any n, 


where C is the circle given by |z — a] = r. Since r may be taken 
as small as desired, we have 


C_1 =c_9=:::=0, 


which means that the Laurent series reduces to a Taylor series. 
Hence, f(z) should be analytic at z = a, which contradicts the 
assumption that z =a is a singular point. & 


3. Let both f(z) and g(z) be analytic in the vicinity of z = a and have a 
zero of mth order at z = a. Prove that 


(8.7) 
This result is called ’ H6pital’s rule. 
Solution: In the vicinity of z = a, we have 


flm-0 (a) 
(m+ 1)! 


of '™**)(a) yet 
(m+2)! | , 


+ (z 


and we also have a form similar to g(z). These expressions imme- 
diately yield the desired equation (8.7). @& 
4. Prove that if f(z) has an essential singularity at z = a, 1/f(z) also 
has an essential singularity. 
Solution: Suppose that f has an essential singularity at z = a 
but that 1/f does not. If this is true, 1/f will at most have a pole 


240 8 Singularity and Continuation 


there (of order N, for instance) and is expressed in terms of the 


series as 

1 Co 

f = b,h”. 

: n=—N 
Rewrite this to obtain 

hN 
See se ae 
Deen m—N 


Note that the denominator 57 b,,—vh™ is analytic within C), and 
thus the fraction 1/ 5* bm—»h™ is as well. As a result, the function 
f would be expanded into a power series in h starting with h%; 
this result contradicts our assumption that f(z) has an essential 
singularity at z = a. Therefore, wherever f(z) has an essential 
singularity, 1/f also necessarily has one. d& 


Remark. The above result sounds intriguing when compared with the behavior 
of an f(z) that has a pole. If f(z) has a pole of order N at z = a, 1/f obviously 
has no pole but does have a zero of order N; i.e., 1/f « (z—a)%. 


8.2 Multivaluedness 

8.2.1 Multivalued Functions 

Up to this point, our concern has been limited to single-valued functions, i.e., 
functions whose values are uniquely specified once z is given. When we con- 
sider multivalued functions, many important theorems must be reformulated. 


The necessary concepts are best illustrated by considering the behavior of 
the function f(z) = z!/? in a graphical manner. Figure 8.1 gives a contour of a 


y 


Fig. 8.1. Mapping of a circle on the z-plane onto an upper-half circle on the w-plane 
through f(z) = 21/? 


VD 


8.2 Multivaluedness 241 


unit circle a + b on the z-plane. Through the transformation w = f(z) = 21/?, 


the circle is mapped onto a semicircle A > B on the w-plane such that 


z=1 —- wet, 
. /2 2 Ay 4 
a ett/ SS i= (ev ) ni / ; 
1/2 
ye ea Set ee (c™) au: 
: . 1/2 F 
z=-i= enti /2 mest gp (en) = edti/4. 


Of importance is the fact that the images of the points a and 8, i.e., A and B, 
respectively, are not equal but are distinct on the w-plane. This suggests that 
the value of z!/? for z = 1 is not uniquely determined. Furthermore, a similar 
phenomenon occurs for any circular contour a — b with an arbitrarily large 
(or small) radius. We thus see that the function f(z) = z!/? is multivalued, 
at least along the positive real axis; one point on the positive real axis of the 
z-plane is associated with two distinct points on the w-plane. 

As a matter of fact, the multivaluedness of the function f(z) = z!/? noted 
above occurs at all points on the whole z-plane (except at the origin). To see 
this, we observe again that the circular contour a — b may have any radius. 
As a result, all the points on the z-plane are correlated with only half of the 
points on the w-plane, those for which Im [w] = v > 0. The remaining values 
of w are generated if a second circuit a — b is made. Namely, the values of 
w with v < 0 will be correlated with those values of z whose arguments lie 
between 27 and 4m. As a consequence, all values for z!/? represented by on 
the w-plane may be divided into two independent sets: the set of values of w 
generated on the first circuit of the z-plane 0 < @ < 27 and those generated 
on the second circuit 27 < @ < 4m. These two independent sets of values for 
z'/2 are called the branches of z!/?. 

The concept of branch allows us to apply the theory of analytic functions 
to many-valued functions, where each branch is defined as a single-valued 
continuous function throughout its region of definition. 


8.2.2 Riemann Surfaces 


For the case z!/?, the notion that the regions 0 < ¢ < 2m and 2n < ¢ <4 
correspond to two different regions of the w-plane is awkward geometrically, 
since each of these two regions covers the z-plane completely. To re-establish 
the single-valuedness and continuity of f(z), it is desirable to give separate 
geometric meanings to two z-plane regions. This is achieved through the use 
of the notion of Riemann surfaces. 

A Riemann surface is an ingenious device for representing both branches 
by means of a single continuous mapping. Suppose that two separate z-planes 
are cut along the positive real semiaxis from +oo to 0 (see Fig. 8.2), and that 
the planes are superimposed on each other but retain their separate identities. 


242 8 Singularity and Continuation 


ay 


Fig. 8.2. A Riemann surface composed of two separated z-planes 


Now suppose that the first quadrant of the upper sheet is joined along the 
cut to the fourth quadrant of the lower sheet to form a continuous surface. It 
is now possible to start a curve C’ in the first quadrant of the upper sheet, go 
around the origin, and cross the positive real semiaxis into the first quadrant 
of the lower sheet in a continuous motion. The curve can be continued on the 
lower sheet around the origin into the first quadrant of the lower sheet. This 
process of cutting and cross-joining two planes leads to the formation of a 
Riemann surface, which is thought of as a single continuous surface formed of 
two Riemann sheets. 
Several important remarks are in order. 


1. According to this model, the positive real semiaxis appears as a line where 
all four edges of our cuts meet. However, the Riemann surface has no 
such property. This results in the line between the first quadrant of the 
upper sheet and the fourth quadrant of the lower sheet being considered 
distinct from the line between the first quadrant of the lower sheet and the 
fourth quadrant of the upper one. There are two real positive semiaxes on 
the Riemann surface just as there are two real negative semiaxes. Hence, 
the entire Riemann surface is mapped one-to-one onto the w-plane. (The 
origin z = 0 belongs to neither branch since the polar angle @ is not defined 
for z = 0.) 


2. The splitting of a multivalued function into branches is arbitrary to a 
great extent. For instance, we can define the following two functions, both 
of which may be treated as branches of f(z) = /2: 


fret! for O0<@<17, 


fre(9+2n)/2 for —1 <9 <0. 


BranchA: f(z) = 


Jrei(9t27)/2 for O<O< 7, 


Jre?/2 for -—17<0<0. 


BranchB: fa(z) = 


Note that branch A is continuous on the negative real semiaxis but is 
discontinuous on the positive real semiaxis (so is branch B). These two 


8.2 Multivaluedness 243 


branches together, constitute, the double-valued function f(z) = //z, and 
this representation is no better and no worse than the previous one. 


3. The above-mentioned technique can be extended to other multivalued 
functions that require more than two Riemann sheets (for instance, f(z) = 
*/z requires three). There are functions requiring an infinite number of 
Riemann sheets, such as f(z) = 2% with an irrational a. 


8.2.3 Branch Point and Branch Cut 


We so back to the behavior of the multivalued function w = f(z) = z'/? to 
introduce other important concepts referred to as branch point and branch 
cut. Let us consider a certain closed curve C' without self-intersections in the 
z-plane. Specify a point zo to which we assign a definite value of the argument 
6). Through the mapping w = 21/2, we will find two distinct points: wo(zo) 
and w1(Zo). 

In what follows, we examine the variation of the functions wo(z) and wi(z) 
as the point z moves continuously along the curve C’. Since the argument of 
the point z on the curve C varies continuously, the functions wo(z) and w1(z) 
are continuous functions of z on the curve C. 

Here, two different cases are possible. In the first case, the curve C does 
not contain the point z = 0 within it. Then, after traveling the curve C, the 
argument of the point zo returns to the original value arg zp = 09. Hence, the 
values of the functions wo(z) and wi(z) are also equal to their original values 
at the point z = 2 after traveling the curve C. Thus, in this case, two distinct 
single-valued functions of the complex variable z are defined on C: 


ig pagel? ong i = ph/2ei/2(O+2m)_ 


Obviously, if the domain D of the z-plane has the property that any closed 
curve in the domain does not contain the point z = 0, then two distinct single- 
valued continuous functions, wo(z) and w1(z), are defined in D. We call the 
functions wo(z) and w;(z) branches of the multivalued function w(z) = 21/2. 

In the second case, the curve C' contains the point z = 0 within it. Then, 
after traversing C' in the positive direction, the value of the argument of the 
point z does not return to the original value #9 but changes by 27 as expressed 
by 

arg 29 = 0) + 27. 


Therefore, as a result of their continuous variation after traversing the curve 
C, the values of the functions wo(z) and w)(z) at the point zp are no longer 
be equal to the original values. More precisely, we obtain 


LT 


™ and (zo) = wi(zo)e’", 


Wo(Z0) = wo(zo)e 


244 8 Singularity and Continuation 


which indicate that the function wo(z) goes into the function w,(z) and vice 
versa. This recurrence phenomenon stems from the fact that z = 0 is the 
branch point of the multivalued function f(z) = z!/?. A formal definition of 
branch point is given below. 


@ Branch point: 

Suppose that several of branches of f(z) are analytic in the neighborhood 
of z =a but not at z =a. Then, the point z = a is a branch point if and 
only if f(z) passes from one of these branches to another when z moves 
along a closed circuit around z = a. 


Remark. The point at infinity, z = oo, is a branch of f(z) if and only if the 
origin is a branch point of f(1/z). 


It is important to note that the branch points for a given multivalued function, 
always occur pairwise so that they are connected by a simple curve called the 
branch cut (cut or branch line). Branch cuts bound the regions within 
which the individual single-valued branches are defined. For instance, in the 
case of f(z) = z!/?, the branch cut ran from the branch point at z = 0 to 
another branch point at z = oo along the positive real axis. It should be 
emphasized here that any curve joining the origin (z = 0) and the point of 
infinity (z = co) would have done just as well. For example, we could have 
used the negative real axis as the branch cut, for which the regions 


—r1<@<m and t<¢<32 


(instead of 0 < ¢ < 2m and 27 < ¢ < 4m) serve as the defining regions for 
the first and second branch. On the w-plane, these two would correspond to 
Re v > 0 and v < 0, respectively. We therefore may choose the branch cut 
that is most convenient for the problem at hand. 


Remark. The choice of branches and branch cuts for a given multivalued func- 
tion is not unique; however, the branch points and the number of branches 
are uniquely determined once a function is given. 


Exercises 


1. Examine the multivaluedness of a logarithm function In z. 
Solution: Expressing z in polar form, nz = In (re?) = lnr+i¢, 
and changing ¢ by 27k results in 


Inz(r,¢6+2rk) =Inr+i(d@+2rk) = In2z(r,d) + 27ik. (8.8) 


8.3 Analytic Continuation 245 


It follows from (8.8) that there is no nonzero value of k for which 
In z(r,@ + 27k) and In z(r, ¢) are equal. Therefore, the logarithm 
function is an infinite-valued function. & 


2. Evaluate loge, log(—1), log(1 + 7) according to the expression (8.8). 
Solution: loge = log|e|+iarge =1+ 2nzi, 
log(—1) = log| — 1| + targ(—1) = (2n + 1)zi, 
log(1 + ¢) = log |1+ i] +¢arg(1 +7) = tog? +(2n+4)ri. oe 


3. Evaluate 1° and i‘ according to the definition of power functions: z* = 
e708? where z(# 0) and a are complex numbers. 


Solution: 
J? =e? log 1 _ eb 2nni = e2nr 
“4 F F . Ty 1 
i) = et lost — ei(2nt+ 5) ri = e(2n—3)T & 


4. Show that a power function z”/” with an irreducible rational number 
m/n (n > 2) is an n-valued function. 
Solution: The multiple values of z(r,¢)"/" = r™/"e™?/” are 
found by varying the integer & in the expression: 


2(r, @ + Qnk)™/” = p/n eime/n oi2rkm/n = peace oe 
Substituting & = n yields 
2(r,o+ any!" = e?F™Ma(r, gh)" = 2(r, 4)", 


wherein e’27™ = 1 for arbitrary m € N. Hence, all multiple values 
of z”/” at a given z are found with a value of k in the range 
0<k<n-1. Since there are n different values of & in this range, 
z™/” is an n-valued function. @& 


8.3 Analytic Continuation 


8.3.1 Continuation by Taylor Series 


It is often the case that a complex function is defined only in a limited region 
in the complex plane. For instance, a series representation of a function is of 
use only within its radius of convergence, but provides no direct information 
about the function outside this radius of convergence. An illustrative example 
is a function f(z) defined by 


fizjHl4tzte2st---. (8.9) 


Obviously, this function is identified with 1/(1 — z) for |z| < 1, whereas it 
diverges for |z| > 1 and thus is no longer equivalent to 1/(1—z). Nevertheless, 
a sophisticated technique makes it possible to identify the function f(z) given 


246 8 Singularity and Continuation 


in (8.9) with 1/(1 — z) even for the region |z| > 1. This technique, by which 
the defined region of a function is extended to an ‘uncultivated’ region, is 
called analytic continuation. The resultant function may often be defined 
by sequential continuation over the entire complex plane without reference to 
the original region of definition. 

To see an actual process of analytic continuation, we suppose that a func- 
tion f is given as a power series around z = 0, with a radius of convergence 
R and a singular point of f being on the circle of convergence. We show 
that it is possible to extend the function outside R. We first note that at any 
point z = a within the circle (|z| < R), we can evaluate not only the value of 
the series but all its derivatives at that point as well because the function f 
is analytic and the series representation has the same radius of convergence. 
Therefore, we can obtain a Taylor series of f(z) around z = a as 


oo f(n) 
fa=>- f “a (z—a)”. (8.10) 
n=0 
The radius of convergence of this series is the distance to the nearest singular 
point, say z = z, (see Fig. 8.3a). The resultant circle of convergence with 
radius Ro = |zs — Zo| is indicated by the solid circle in the figure. One may 
setup this process using a new point, e.g., z = b, not necessarily within the 
original circle of convergence (see Fig. 8.3b), about which a new series such as 
(8.10) can be set up (see Fig. 8.3c). Continuing on in this way, it is apparently 
possible by means of such a series of overlapping circles to obtain values for 
f for every point in the complex plane excluding the singular points. 
Our current discussion can be summarized as follows: 


1. Let f(z) be defined by its Taylor series expansion around z = a within 
some circle |z — a] =r. 

2. Specify a certain point z = b within the circle and evaluate f(b), f’(b),--- 
to obtain a Taylor series of f(z) around z = b. 

3. Observe that the latter series converges within a circle |z — b| = r’ that 
intersects the first circle but may contain a region that is not within the 
first circle. 

4. Specify again another point z = c within the circle |z — b| = r’ and repeat 
the process described above. 


8.3.2 Function Elements 


We know that the term ‘analytic continuation’ refers to a method that allows 
us to extend the defining region of a complex function. Alternatively, this term 
can refer to the function that is newly found through analytic continuation of 
some other function. The formal definition is given below. 


8.3 Analytic Continuation 247 


Fig. 8.3. Illustration of an analytic continuation procedure 


@ Analytic continuation: 

Given a single-valued analytic function f,(z) defined on a region D,, the 
analytic function f2(z) defined on Dz is called an analytic continuation 
of f1(z) to Do if and only if the intersection D; M D2 contains a simply 
connected open region where f(z) = fa(z). 


If the two analytic functions f;(z) and f2(z) defined on D; and Dg, respec- 
tively, are analytic continuations of one another, then it is evident that an 
analytic function f(z) can be defined on D; U Dz by setting 


fiz) in Dy, 
Fle) ~ ee in Dag. 


Here, f; and fo are called function elements of f. More generally, we can 
consider a sequence of function elements (f1, fo,--- ,fn) such that f, is an 
analytic continuation of f,_1. The elements of such a sequence are called 
analytic continuations of each other. Relevant terminology for this point 
is given below. 


248 8 Singularity and Continuation 


@ General analytic function: 

A general analytic function f is a nonvoid collection of function elements 
fr in which any two elements are analytic continuations of each other by 
way of a chain whose links are members of f. 


@ Complete analytic function: 
A complete analytic function f is a general analytic function that con- 
tains all the analytic continuations of any one of its elements. 


A complete analytic function is evidently maximal in the sense that it can- 
not be further extended. Moreover, it is clear that every function element be- 
longs to a unique complete analytic function. Incomplete general analytic 
functions are more arbitrary, and there are many cases in which two different 
collections of function elements should be regarded as defining the same func- 
tion. For instance, a single-valued function f(z) defined in D can be identified 
either with the collection that consists of the single function element defined 
on D or with the collection of all function elements defined on D’ Cc D. 


Examples 1. Let us consider the functions 


fi(z) = 5© 2” defined on |z| <1 (8.11) 


n=0 
and 


fore) 3 n+l 2 n 
fo(z) = ss (=) (: + a) defined on 


n=0 


2 5 
-|< nx. 8.12 
“+9 3 ( ) 


Both series converge to 1/(1 — z); Particularly the latter converges since 


n=0 


Therefore, the two functions represent the same function f(z) = 1/(1—z) 
in the two overlapping regions (see Fig. 8.4), although they have different 
series representations. In this context, we can write 


fe) = fi(z) when for z€ Di, Dy = {z: |z| < 1}, 


fo(z) when for z € Dg, Do = {z: |z+ 2| < 3}. 


2. Another illustrative example is given by 


filz) = fi edt defined on Rez > 0 (8.13) 
0 


8.3 Analytic Continuation 249 


y 


Fig. 8.4. Both functions fi(z) in (8.11) and f2(a) in (8.12) represent the same 
function f(z) = 1/(1— z) in the overlapping region D, MN D2 


and 


fo(z) = aS (=) defined on |z + %| < 1. 
n=0 


Observe that each f; and fg reads 1/z for the respective defining region. 
Thus, we have 


1 filz) for z€ Di, Dy = {z: Rez > 0}, 
fo(z) for z€ De, Do = {z:|z+%| < ]}. 


z 


The two functions are analytic continuations of one another, and f(z) = 
1/z is the analytic continuation of both f; and fy for all z except z = 0. 


Remark. In some cases, it is impossible to extend the function outside of a 
finite region because an infinite number of singularities are located densely on 
the boundary of the region. In that event, the boundary of this region is called 
the natural boundary of the function and the region within this boundary 
is called the region of the existence of the function. 


250 8 Singularity and Continuation 
8.3.3 Uniqueness Theorem 


Having introduced the concept of analytic continuation, we may ask a question 
as to whether the function resulting from an analytic continuation process 
is uniquely determined, independent of the continuing path; i-e., whether a 
function that is continued along two different routes from one area to another 
will have the same value in the final area. We now attempt to answer this 
question by examining the theorem below. 


@ Uniqueness theorem: 

Let fi(z) and f(z) be analytic within a region D. If the two functions 
coincide in the neighborhood of a point z € D, then they coincide through- 
out D. 


Proof The theorem to be proven is rewritten in the following statement: Jf 
both f(z) and g(z) are analytic at zo and if f(zn) = g(zn) withn =1,2,--+ at 
points Zz, that satisfy limp—oo Zn = 20 but 2n 4 20 for all n, then f(z) = g(z) 
throughout D. We now prove it. 

Let h(z) = f(z) —g(z). Here, f and g are assumed to satisfy the conditions 
given in the statement above, so that h(zp) = 0 for all n and h(z) is analytic 
at 2. Owing to the analyticity of h(z) at zo, we have the expansion 


h(z) = a9 + a1(z — 2) + da(z— 29)? +. ; 


which converges in a certain circle around zo. Since h(z) is continuous at Zo, 
we have 
h(zo) = lim h(zn) = 0, 


n—- oo 


which means that the coefficient ao is zero. Then, since h’(z) is also continuous 
at 2, we set 
h'(z) = lim h'(zn) = 0, 


noo 
which means that a; = 0. Continuing in this fashion, we find successively 
that all the coefficients vanish. In its circle of convergence, the function h(z) 
is therefore identically zero. This completes the proof. @& 


This remarkable theorem demonstrates the strong correlation between the 
behaviors of analytic functions on different parts of the complex plane. For 
example, if two functions agree in value over a small arc (arbitrarily small as 
long as it is not a point), then they are identical in their common region of 
analyticity. 


8.3.4 Conservation of Functional Equations 


An important consequence of the uniqueness theorem is the so-called principle 
of the conservation of a functional equation. 


8.3 Analytic Continuation 251 


@ Conservation of functional equations: 

Let F(p,q,r) be an analytic function for all values of the three vari- 
ables p,q,r, and let f(z) and g(z) be analytic functions of z. If a relation 
F | f(z), 9(z), 2] = 0 between function elements f(z) and g(z) holds on a do- 
main, then this relation is also true for all analytic continuations of these 
function elements. 


Remark. In plain words, this theorem states that analytic continuations of f(z) 
satisfy every functional (and differential) equation satisfied by the original 


f(z). 


This theorem can easily be generalized to cases of functional equations involv- 
ing more than two functions. We illustrate this by two examples. 


Examples 1. From elementary trigonometry, we know that the real function 
sin x has the additional theorem 


sin(a + u) = sina cosu+ cos xsinu, 


where wu is an arbitrary real value. Since sin z, cos z, and sin(z + u) are 
analytic for all finite values of z, and since the relation 


sin(z + u) = sin z cos u + cos z sin u 


is satisfied if z is any point on the real axis, it follows by analytic contin- 
uation that the same relation must hold for all values of z. If we report 
the same argument with respect to the real variable u, we find that u may 
be replaced by a complex variable w without invalidating the relation in 
question. Hence, the addition theorem of the function sin z is true for 
arbitrary complex values of z and w. 

2. Another important example is afforded by functions satisfying differential 
equations. To take a simple case, we consider the function 


f(2) =log( +2), 
This is represented for |z| < 1 by the power series 


2 3 


z z 
which yields 
1 
Falah eee ee iek= l+z 


In this context, the identity 


252 8 Singularity and Continuation 


ee 
~ L+z 


f'(z) (8.15) 
appears to be valid for |z| < 1. However, it follows that the identity (8.15) 
must hold for all analytic continuations of the power series (8.14). 


8.3.5 Continuation Around a Branch Point 


The uniqueness theorem given in Sect. 8.3.3 also gives us the following corol- 
lary: 


@ Theorem: 

If D; and D2 are regions into which f(z) has been continued from D, 
yielding the corresponding functions f; and fo, and if D3 = D, M D2 also 
overlaps D, then f; = fz throughout D3. 


It is important to note that the validity of this theorem is due to the condition 
that D3 and D have a common region. If this condition is not satisfied, the 
uniqueness of analytic continuation may break down. Instead, one can say: If 
analytic continuation of a function f along two different routes from zp to 21 
yields two different values at z;, then f(z) must have a certain kind of singu- 
larity between the two routes. This seems obvious by recalling the fact that 
the radius of convergence of a power series extends up to the next singularity 
of the function; if there were no singularities between the two routes, then it 
would be possible to fill in the region between the two routes by means of an- 
alytic continuation based on the power series. Then we would obtain sufficient 
overlapping so that the uniqueness theorem would be satisfied. In that event 
f(z1) for the two different routes would be identical, in contradiction to our 
hypothesis. There must therefore be a singularity between the two routes. 

Note that the last discussion does not state that different values must be 
obtained if there is any kind of singularity between the two routes. It must be 
a particular type of singularity to cause a discrepancy, and we call it a branch 
point, as we introduced earlier. An analytic function involving branch points 
is said to be multivalued and the various possible sets of values generated by 
the process of analytic continuation are known as branches. Intuitively, all 
the possible values of a function at a given point may be obtained by the 
process of analytic continuation if one winds about the branch point as many 
times as necessary. 


8.3.6 Natural Boundaries 


In all the examples considered so far, the singularities were isolated points. It 
is, however, easy to construct functions for which this is not the case. Consider, 
say, the function 


8.3 Analytic Continuation 253 


1 
IAN aa ay: 
The denominator vanishes for 1/z = nz with an integer n. Hence, the points 
z = (1/nz) are singular points of f(z), but are clearly isolated in the vicinity 
of the origin. It is further possible for the singular points of a function to fill 
a whole arc of a continuous curve; in this case, we speak of a singular line 
of the function. 

Particularly interesting is a situation in which a function f(z) has a closed 
singular line C. In this case, it is obviously impossible to continue f(z) 
analytically across C. The entire domain of definition of f(z) is therefore 
the interior of C’, and we say that C' is a natural boundary of f(z). 

Such an occurrence is not as unusual as it may seem. Consider, for instance, 
the analytic function f(z) defined by the power series 


faazt Peet Aterayo2. (8.16) 
n=0 


By the root test given in Sect. 2.4.3, the circle of convergence of this series 
turns out to be |z| < 1. Thus f(z) must have at least one singularity on |z| = 1. 
For the sake of simplicity, we assume that this singularity is situated at the 
point z = 1; a different location will cause a minor change in the argument. 
From the definition of f(z), it follows that 

lo) 

f(@)=2P4+A4+ 84-3 5 2" = f(z) —-z. 

n=1 

By the principle of conservation (see Sect. 8.3.4), the functional equation 
f(z) =z+ f(*) (8.17) 
is true for all analytic continuations of f(z). Observe that (8.17) gives 
f(z) = 1+ 2zf'(z*), 


which means that f(z) cannot have a derivative at z = —1 since from hypoth- 
esis f(1) does not exist. Thus, z = —1 is also a singular point of f(z). In the 
same way, from the relation 


fe =24+f(2) = 2427 4 f(z") 


it follows that the points z for which z4 = 1 are singularities of f(z). 
Continuing in this fashion, we conclude that all points z for which z2" = 1 
are singularities of f(z). But these are the points e?7*/@") that divide the 
circumference |z| = 1 into 2” equal parts. Since, for n — ov, all points on 
|z| = 1 are limits of these points and since the limit point of singular points 
is also a singularity, it follows that all points on |z| = 1 are singular points of 
f(z). We have thus proven that the unit circle is the natural boundary of the 
analytic function (8.16). 


254 8 Singularity and Continuation 
8.3.7 Technique of Analytic Continuations 


The uniqueness theorem is the fundamental theorem in the theory of analytic 
continuation. However, in practice, the most relevant method would be one 
that tells us whether a function f2 is the analytic continuation of a function 
fi. 

Let us describe two possible methods of analytic continuation: The first is 
based on the Schwarz principle of reflection, which essentially makes use 
of the functional relation f(z*) = f(z)”. 


@ Schwarz principle of reflection: 
If f(z) is analytic within a region D intersected by the real axis and is 
real on the real axis, then we have f(z*) = f(z)*. 


Proof Expand f(z) in a Taylor series about a point a on the real axis. The 
coefficients of the Taylor series are real by virtue of the hypothesis that f(z) 
is real on the real axis. Hence, we have 


f(z) = do en(z- 0)", (8.18) 


n 


where c,, is real. Then 
fa) = Sela)" =f), (8.19) 


proving the theorem. & 


The above theorem holds for any point within the circle of convergence of 
the power series. By the methods of analytic continuation, therefore, it may 
be extended to include any nonsingular point conjugate to a point in D. Asa 
result, the function in question can be continued from a region above the real 
axis to a region below. 

A second method employs explicit functional relations such as addition 
formulas or recurrence relations. A simple example is provided by the 
addition formula 


fle+ a) = fz) fla). 


If f were known only in a given region, it would be continued outside that 
region to any point given by the addition of the coordinates of any two points 
within the region. A less trivial example occurs in the theory of gamma 
functions. The gamma function is defined by the integral 


8.3 Analytic Continuation 255 
Co 
Eg) =} et ae (8.20) 
0 


This integral converges only for Re z > 0, so that it defines ['(z) for only 
the right half of the complex plane. From (8.20), one may readily derive (by 
integrating by parts) a functional relationship between ['(z) and I'(z + 1): 


zI(z) =I(z+1). (8.21) 


We may now use (8.21) to continue ’(z) into the Re z < 0 part of the complex 
plane. As first, we assume that I’(z) is known for x > 0. Then using recurrence 
relation (8.21), the points in the strip —1/2 < « < 1/2 can be computed in 
terms of the values of ['(z) for 2 > 0. The function so defined and the original 
function have an overlapping region of convergence so that it is the analytic 
continuation into the negative x-region. 


8.3.8 The Method of Moment 
Suppose that we are given a power series f(z) = )77°.)@nz” where the co- 


efficients a, are the moments of a given continuous function. For example, 
suppose that there exists a continuous function g on [0,1] such that 


1 
on f g(t)t dt. 
0 


Then a ; sh ; 
f= 3 Lf mera oe Lf soreorar], 


and interchanging the order of summation and integration, we find that 


f(a) = f Se otoceer a= f ae 


n=0 


(The interchange of summation and integration is easy to justify if |z| < 1.) 
Moreover, this integral form serves to define an analytic extension of the 
original power series. 


Examples Consider 


f= YS (el <- (8.22) 


Since 


256 8 Singularity and Continuation 
we set g(t) = 1 to obtain 


Bo wig 


f(z) = F —— 


for |z| <1. 

The integral above is the analytic continuation of the original representation 
(8.22), so that the latter is analytic throughout the complex plane except for 
the semi-infinite line [1, 00). [In fact, the analytic continuation has a discon- 
tinuity at every point of the interval [1,0o).] 


Exercises 
_ n 

1. Suppose f(z) = y chez" with liminf “+ > 1. Prove that the circle of 
a0 k— oo Nk 


convergence of f(z) above is a natural boundary for f. 


Solution: Since the result is independent of cz, we may assume 
without loss of generality that the radius of convergence is 1. In 
addition, neglecting finitely many terms if necessary, we assume 
that for some 6 > 0 and for all k, ngyi/ne = 14 6. Finally, 
it suffices to show that f is singular at the point z = 1. The 
same result applied to the series 77° 9 c,(ze~’”)”* shows that f 
is singular at any point z = e”?. 

Choose an integer m > 0 such that (m+ 1)/m <1+6 and 
consider the power series g(w) obtained by setting z = (w™ + 
w+!) /2. We then find that 


wm + wet 
LG) oat Sl eer cs 
2 
CO mno Cono wert fatuara. Co wrrro+no 
270 270 270 
C1 omni C121 omni 41 C1 omni tn 
er ad a om =F T on, @ Pee 


Note that in this expression no two terms involve the same power 
of w, since the inequality mngi1 > mng + nz holds whenever 
ne, /Me > (m+1)/m. If |w| < 1, then (\w| + |w|™*)/2 < 1, 
and since f(z) is absolutely convergent for |z| < 1, the series 
eo lexl (wl + wl *1)/2]"* converges. Hence, for |w| < 1, 
g(w) is absolutely convergent. On the other hand, if we take w 
real and greater than 1, then (w™” + w'™t!)/2 > 1, so the series 
Se cal(w™ + w™ 1) /2]"* diverges. Note, though, that the jth 
partial sums s,; of the above series are exactly the n;(m + 1)th 
partial sums of the power series of g. Hence, the series for g(w) 


8.3 Analytic Continuation 


diverges and g, too, has a radius of convergence of 1. This means 
that g(w) must have a singularity at some point wo with |wo| = 1. 
If wo € 1, then |(w™ + w™t1)/2| < 1 and since f is analytic in 
|z| < 1, g is analytic at wo. Thus g must have a singularity at 
wo = 1 and since g(w) = f[(w™ + w™t')/2], f(z) must have a 
singularity atz=1. & 


C2 n C2 n 
z z 
2. Defi lyti tinuation of: (i y —, (ii y ——. 
efine an analytic continuation of: (i) Tn’ (ii) a a 
n=1 n=0 
Solution: 


1 co 
(i) Since ys rays) [ e~™+-?/3dt, we have 
a 0 


which is analytic outside of the interval [1, co). 


1 co 
(ii) Since = e—™ sin tdt, 
0 


n2+1 


fore) 7 66. 88 oO Ut: 
z a eerie _ e’ sint 

) are = i ) (ze ) sin tdt = / et Ey dt, 

n=0 n=0 


0 


which is analytic outside of the interval [1,co). & 


3. Suppose that f is bounded and analytic in Imz > 0 and real on the real 


axis. Prove that f is constant. 
Solution: By the Schwarz reflection principle, f can be extended 
to the entire plane and would then be a bounded entire function. 
Hence, f is constant. & 


4. Given an entire function that is real on the real axis and imaginary on 
the imaginary axis, prove that it is an odd function; i.e., f(z) = —f(—z). 


Solution: Set f(z) = f(x,y) = u(x + iy) + iv(x + ty). The 
Schwarz reflection principle implies that f(z*) = f(x — iy) = 
u(a—iy) +tu(a—ty) = u(a+ity) —iv(a+iy) = —f(z). Ina similar 
way, we have f(—z) = f(—a# — iy) = u(—a2 — iy) + iv(—a2 — iy) = 
—ulx + ty) — iv(a + iy) =—f(z). & 


9 


Contour Integrals 


Abstract In this chapter, we show that singularities do not interfere with the anal- 
ysis of complex functions but are useful in extracting complex integrals along closed 
contours. This utility of singularities is based on the residue theorem (Sect. 9.1.1), 
argument principle (Sect. 9.4), and principal value integrals (Sect. 9.5.1), all of which 
correlate the nature of singularities within and/or on the contour with the relevant 
complex integrals. 


9.1 Calculus of Residues 


9.1.1 Residue Theorem 


In the preceding two chapters, we provided the theoretical bases of complex 
functions. This chapter deals with more practical matters that are relevant to 
computations of contour integrations on a complex plane. The theorem below 
is central to the development of this topic. 


@ Residue theorem: 
Ifa function f(z) is analytic everywhere within a closed contour C' except 
at a finite number of poles, its contour integral along C' yields 


f Hes = Lae Dy Res(f, a;). (9.1) 


Here, Res(f,a;) is called the residue of f(z) at the pole z = aj. When the 
pole is mth order, it reads 


(m=1) 
Res(f,a;) = ——— lim 2 [(z—aj)"f(2)]. (9.2) 


(m — 1)! za; dzem—l 


260 9 Contour Integrals 


Once the residue is evaluated, the integral $, f(z)dz around the contour C 
surrounding the pole z = a can be determined by the above theorem. Notably, 
this theorem enables us to evaluate various kinds of integrals of real functions 
that are unfeasible by means of elementary calculus. 

Before demonstrating the utility of the residue theorem, we present a short 
review of the nature of residues. Originally, the residue of f(z) is defined in 
association with a particular coefficient of the Laurent series expansion. We 
know that f(z) around its pole at z = a may be expressed by a Laurent series 
expansion such as 


8x n 2a F(z) 
Fath)= D7 enh”, n= sf Goan 


n=—Co 


Then, the specific coefficient 


1 
ci= sai flees (9.3) 


is called the residue of f(z) at z = a. In fact, the result (9.3) immediately 
reduces to the form of (9.1) as 
1 


Qni 


¢ f(z)dz = 2mic_y. 
Cc 


The equivalence of the two quantities, Res(f,a) in (9.2) and c_y in (9.3), is 
verified as follows. 


Proof (of the residue theorem). Suppose that f(z) has a pole of order m at a. 
Then f(z) can be written as 


Cm C-m+1 fe C-1 = o.(z—a)". ; 
(ON G= om ears ese ee ee 


Now we introduce the quantity 


g(2) = (z—a)™f(2) = com -+e-mir(e— a) ++ 
= S- Cram —a)”. (9.5) 
n=0 


Since g(z) is analytic everywhere in a neighborhood around a, it can be 
expanded in terms of a Taylor series as 


g(z) = a ae —a)”. (9.6) 
n=0 : 


The residue c_, is the coefficient of the n = m— 1 term in (9.5). Hence, 
comparing (9.5) with (9.6), we have 


9.1 Calculus of Residues 261 


1 . qim-) 


(m— 1)! 2-50 dz) 


1 
pe (m—1) = 
ars (m—1!? (a) 


((z- a)" F(z], (9.7) 
which is simply equation (9.2). & 


9.1.2 Remarks on Residues 


The reason that only the particular coefficient c_; plays a role in evaluating 
the contour integral is clarified by integraing both sides of (9.4) along the 
contour containing the mth-order pole a. For convenience, we rewrite (9.4) as 


joLy en, (9.8) 


< (z-a)” 
where is 
Walz) = So enla)(2— a)” 
n=0 


is the regular part of the series (9.8), thus being analytic everywhere in a 
region within a closed contour C' containing a. By integrating f(z) along the 
contour C’, we set 


f fee = Yong cool (9.9) 


because of the analyticity of Y4(z). The integral of (9.9) can be easily eval- 
uated by letting the contour be a circle of radius p centered at a. Since any 
point on the contour can be expressed as z = a + pe’®, we have 


1 27 id 20 ' 
bat = [Pepe = we [ter voag, (0.10) 
cg (z—a) o Pe 0 


Note that the integral (9.10) vanishes for all n 4 1, and it is only when n = 1 
that it has a nonzero value: 


1 got2n 
¢ dz = if dé = 21. 
cz—a 6 


Therefore, all the terms in the sum of (9.9) are zero except the n = 1 term, 
and Goursat’s formula takes the form 


¢ f(z)dz = 2ni c_1. (9.11) 
Cc 


In short, once we integrate the function f(z) in (9.8), only the term involving 
c_, survives, whereas the other terms vanish. This results in the fact that the 
contour integral $, f(z)dz around a pole is determined by the value of the 
specific coefficient c_1. 


262 9 Contour Integrals 
9.1.3 Winding Number 


To evaluate $,, f(z)dz when C is a general closed curve (and when f may 
have isolated singlarities), we introduce the following concept: 


@ Winding number: 
Suppose that C is a closed curve and that the point z = a is not located 


on C. Then the number 
1 
n(C,a) = maf We 
271 Jo 2-4 


is called the winding number of C around a. 


Note that if C represents the boundary of a circle (traversed counterclockwise), 
then the winding number reads 


0 if a is inside the circle, 
1 if a is outside the circle. 


n(C,a) = { 


Both identities have already been proven in the context of Cauchy’s theorem. 
In addition, if the curve C' encloses k times the point a, then we have 


1 2kr 
n(C,a) = | idO = k, 
271 Jo 
which explains the terminology “winding number.” 


@ Theorem: 
For any closed curve C and point a ¢ C, the winding number n(C, a) is 
an integer. 


Proof Suppose that C' is parametrized by z(t), 0 <t <1, and set 


f= (Oe <<). 


Then, it follows from 


that the quantity 
[e(s) — lef) 


is a constant, and setting s = 0, we have 


[z(s) — ale~*) = 2(0) —a. 


9.1 Calculus of Residues 263 


Hence, 


since C' is closed, i.e., z(1) = z(0). Thus 
f(1) = 2rki for some integer k 


and i 
n(C,a)=5—fUl)=k. & 
Pia) 
In terms of the winding number, the residue theorem given in Sect. 9.1.1 


can be restated as follows: 


@ Residue theorem (restated): 

Suppose f(z) is analytic in a simply connected domain D except for 
isolated singularities at 21, 22,--- ,Zm-. Let C be a closed curve that does 
not intersect any of the singularities. Then 


¢ {eae = 2niS n(C, Zp) Res(f, Zz). (9.12) 
k=1 


The proof is left to the reader. 


9.1.4 Ratio Method 


We saw in Sect. 7.4.5 that a function having a pole of order m can be expressed 
by the ratio of two polynomials such as 


f(2 =. (9.13) 


In this case, it is possible to formulate an alternative equation that determines 
the residue of f(z). Employing such an equation to evaluate the residue is 
referred to as a ratio method. 

To derive these equations, we first recall the fact that if a function R(z) 
satisfies 


p(a) = p'(a) = ++ = p™V(a)=0 and p™(a) £0, 


the Taylor series for R(z) is given by 


(m) (q 
p(z) = = “ des —a)™ + ho., 


264 9 Contour Integrals 


where h.o. means the terms of higher order. Such a function, for which the 
lowest power of (z — a) is m, is said to have an mth-order zero at a. 

Now we present the equation for the residue of f(z) at a simple pole a. 
As seen from (9.13), a simple pole of f(z) arises from the fact that p(z) has a 
zero of (m —1)th order and q(z) has a zero of order m. Then, 


pl™-V(a) m-1 
ee + h.o. 
IO a 
a (z-—a)™+ ho. 


For such a function, we obtain the residue of f at the simple pole a as 


pm aoe 
g(a) 
) 


By means of 9.14, we can compute the residue of f(z) at a simple pole a quite 
easily. 

Next we consider the equation for a second-order pole of f(z) at a. Such 
a pole arises when p(z) has a zero of order m and q(z) has a zero of order 
(m+ 2) at a. Then, 


c_1 = lim(z-a)f(z) =m 


Za 


(9.14) 


et (z ) ao ayn h.o. 
f(z) g’™+2) (a) amt? Gera) bs ams or ’ 
(m+ 2)! ' (m+3)! aa 


from which we set 


c_; = lim ae [(z — a)? f(2)] 


za AZ 


_ m+2 (m+3)p™(a)q"*2)(a) — (m+ 1p (a)g’”*) (a) (9.15) 
m+3 [a(™+) (a)] 


For example, if the second-order pole of a function arises from a second-order 
zero of q(z), then m = 0. The residue of such a pole is given by (9.15) as 


_ 2 3p'(a)d"(a) = p(a)q)(a)_ 


(9.16) 


9.1.5 Evaluating the Residues 


In what follows, we demonstrate actual procedures to evaluate the residue 
by means of the three methods discussed in the previous subsections. As an 
instructive example, we consider the function 


9.1 Calculus of Residues 265 


z 


€ 
f= aaa 
which has a simple pole at z = 0 and a second-order pole at z = —2. 


Using a Laurent expansion: 


The present purpose is to evaluate the coefficient c_, of the Laurent series 
expansion of f(z) around the poles at z = 0 and z = —2. In order to do this 
we first determine the Taylor series for the factor e*/(z + 2)? around z = 0. 
Since the expressions 


e eer eo 


and 
1 1 1 


2 
= = 1 eee 
(2+2)2  4{1+(2/2)] 4 4 
hold around z = 0, we have 
e* 1 Nas Fae 1 5 ug Fee 
ge fea os a ep a aN Son ee ot 
zz+22 4z Ta 7 4z | 16 


Thus, we immediately obtain 


Similarly we have 


3 
c_1(—2) = =Ge (see Exercise 1). 


Using Goursat’s formula: 


The residue of the simple pole at z = 0 is given by 


c_1(0) = lim E =| - = 


z—0 zZt+ 2)? 


| 
o 


1! 25-2 dz 


1 

4 

and that of the second-order pole at z = —2 is given by 
(z+ 2)? aaa 


c_1(—2) = ee E 2)? — 


266 


9 Contour Integrals 


Using the ratio method: 


For this example, the numerator and denominator functions can be chosen in 
different ways. For the residue at z = 0, we could take 


p(z)=e*, g(z) = 2(z + 2)? 


or, alternatively, 


e* 


p(z) = G42)?’ q(z) = 2. 


For either choice, the residue for the simple pole is given by 


The residue c_;(—2) can be obtained in a similar manner as above (see 
Exercise 2). 


Exercises 


1. Evaluate the residue of 


e* 


2) = Fae 


at z = —2 by using a Laurent expansion. 


Solution: The residue of f(z) at z = —2 is found by using the 
expression 


Pe a ead a) ee (z+ 2)? 
meer = et nl =e 7 |1+(z+2)+ m1 
n=0 
and the Taylor series expansion for 1/z around z = —2 as 
| 1 _ SE — 1 2+2 (242)? 
z 2 |1-(2+2)/2| — i a wer 8 
Thus, the Laurent series for f(z) around z = —2 is 
e 1 _, 1 438 1 2? 
= e | eae ; 
z(z +2)? 2 (z+2)? 2242 4 


from which we have 


2. 


9.2 Applications to Real Integrals 267 


Evaluate the residue of : 


e 
at z = —2 using the ratio method. 
Solution: For the pole at z = —2, we can choose either 


p(z) =e, g(z) = 2(z + 2)? 
as before or F 
p(z)=—, a(z) = (2 +2)". 
Then, regardless of how the numerator and denominator are cho- 
sen, we refer to (9.16) to obtain 
2 3p'(—2)q"(—2) — p(-2)g(-2) _ 3 


eae G a" (2)? = Ne 


9.2 Applications to Real Integrals 


9.2.1 Classification of Evaluable Real Integrals 


Using the residue theorem, we can evaluate the five types of real integrals 
listed below. 


1. 


2 
f(cos@, sin@)d0, where f(x,y) is a rational function without a pole 


0 
on the circle x? + y? = 1. 


: La f(a)dz, where f(z) is a rational function without a real pole and 


is subject to the condition that pe uf (x) = 0. 


Co 


f(a)e'“dx, where f(z) is an analytic function in the upper-half 


—oo 
plane Imz > 0 except at a finite number of points. 


f(a)/a°dx, where a denotes a real number such that 0 < a < 1 and 


f(z) is a rational function with no pole on the positive real axis x > 0, 
which satisfies the condition f(z)/z°~' > 0 as z — 0 and z > oo. 


f(a)logxdz, where f(z) is a rational function with no pole on the 


positive real axis x > 0 and satisfies the condition lim xf (x) = 0. 
w+ 0O 


268 9 Contour Integrals 


In Sect. 9.2.2-9.2.6 we demonstrate actual processes for evaluating the 
above integrals. 


9.2.2 Type 1: Integrals of f (cos 0, sin 0) 


Consider an integral of the form 


2 
f(cos@, sin 6)d0. 
0 


Setting z = e’” makes it a contour integral around the unit circle, and thus 
the evaluation of the residues within the circle completes the integration. 


Example We evaluate the integral 


= [" a (eat (9.17) 
~ Jo 1—2pcosé+ p? 


If we express cos @ in terms of z = e””, 9.17 becomes a contour integral, 


1 d 1 d 
I=$ — <= x : (9.18) 
Cl=—pE te) i Jo (1— pz)(z— p) 


where C is a unit circle centered at the origin. The integrand in (9.18) has a 
simple (first-order) pole at z = p within C. Hence, we obtain 


1 1 2 
I= x 2ni lim ee | 
i 1— pz 7 


Zp 


9.2.3 Type 2: Integrals of Rational Function 


Next consider the integral 
r=] f(a)dz, (9.19) 


where f(x) is a rational function subject to the condition 


lim af(x) =0, 
|2|—00 
which is a necessary and sufficient condition for the integral to be convergent. 
To evaluate (9.19), we consider the integral of f(z) along a closed contour 
consisting of the real axis from —R to +R and a semicircle ['(R) in the upper 
half-plane. The contour integral is expressed as 


frode= fi reece f f(2dz. (9.20) 


9.2 Applications to Real Integrals 269 


From the Lemma below, it follows that the second term in (9.20) vanishes in 
the limit R — oo. Hence, we obtain 


lim § teode= [ f(a)dz, (9.21) 


R-0o 


and applying the residue theorem yields 
/ f(a)dx = 2ni S > Res(f, a), 
268 j 


where a, is the jth pole of f(z) in the upper half-plane. Therefore, the evalu- 
ation of the residues located within the upper half-plane completes the inte- 
gration. 


Example We prove the equation 


°° dx 
l= —————— 
Le a 


Since x/(1+ x”) vanishes as || — 00, we may follow a process similar to the 
one discussed above. Since z = 7 is the only pole of 1/(1+27) = 1/(z+7)(z-i) 
involved in the upper half-plane, we have 


1 
I = (277%) - Res(i) = amis =. 
i 


Less simple examples will be found in Exercises Sect. 9.2. & 
As was noted earlier, our result (9.21) is based on the following lemma: 
@ Lemma: 


Let f(z) be continuous in the sector 6; < argz < 6. If 


lim zf(z) =0 for 0, < argz < 02, (9.22) 


|z| 00 


then the integral { f(z)dz extended over the arc of the circle |z| = r con- 
tained in the sector tends to 0 as r — oo. 


Proof Let M(r) be the upper bound of |f(z)| on the arc of the circle |z| =r. 


Then we have 
 [ sera: 


In view of the condition (9.22), the right-hand side of (9.23) vanishes as r > 
oo. This completes the proof. & 


< M(r) is rd0 = M(r)-r(02 — 41). (9.23) 


270 9 Contour Integrals 
9.2.4 Type 3: Integrals of f(x)e” 


We now study integrals of the form 
/ f(a)e' dz, 


where f is analytic on the upper half-plane Imz > 0 except at a finite number 
of singularities (if they exist). We first consider the case when the singularities 
are not on the real axis. Then, the integral 


R . 
: f(xje dx 
-R 
has a meaning, which can be seen from the following theorem: 


@ Theorem: 
If lim.) 00 f(z) = 0 for Imz > 0, then 


R 
li Ade= oni 5) Ri a 
pln, fh Slade be wd, es [f(z)e’*] , 
the summation extending over the singularities of f(z) contained in the 
upper half-plane y > 0. 


Before starting the proof, we note that |e’*| < 1 in the half-plane y > 0. This 
leads us to integrate on the half-plane y > 0 along the contour used above for 
an integral of type 2. To prove the theorem, thus it suffices to show that the 
integral Jip) f(z)e’*dz tends to 0 as r tends to oo. 

If we know in advance that lim),)... zf(z) = 0, then it would be sufficient 
to apply the lemma in Sect. 9.2.3. To prove that Srir) f(z)e’*dz tends to 0 
with only the hypothesis of the theorem above, we use the following lemma: 


@ Jordan Lemma: 

Let f(z) be a function defined in a sector of the half-plane y > 0. If 
lim),|—.co f(z) = 0, the integral f{ f(z)e’*dz extended over the arc of the 
circle |z| = r contained in the sector tends to 0 as r tends to oo. 


Proof Let us put z = re’® and let M(r) be the upper bound of |f(re’’)| as 0 
varies, the point e’’ remaining in the sector. Then, 


[teeta 


T ; n/2 ; 
< mo) [ Pas 2u(r) f er ere: (9.24) 
) 0 


9.2 Applications to Real Integrals 271 


Since nO 
T sin us 
—< <1 f <O0< = 
gS pe it 0<6< 5° 
we have 
n/2 . n/2 m9 
| ear sin a) = t ens org < i) ens or = us (9.25) 
0 0 0 a 


From (9.24) and (9.25), it follows that 


| / f(2el@dz 


In view of our assumption lim),).. f(z) = 0, the right-hand side of (9.26) 
vanishes as r — oo, which completes the proof. & 


<aM(r). (9.26) 


Remark. 


1. If we have to calculate an integral 


i 7 f(aje® de 


that involves a negative imaginary exponential e~*”, it would be necessary 
to integrate in the lower half-plane instead of the upper one because the 
function |e~**| is bounded in the lower half-plane y < 0. More generally, 
an integral of the form {°° f(«)e*’dx (where a is complex constant) can 
be evaluated by integrating in the half-plane where |e**| < 1. 

2. Remember that sinz and cosz are not bounded in any half-plane. To 
evaluate integrals of the form 


/ f(x)sin” edz and / f(x) cos” xda, 


we always express the trigonometric functions in terms of complex expo- 
nentials so that the preceding methods can be applied. 


9.2.5 Type 4: Integrals of f(x) /ax® 


where a@ denotes a real number such that 0 < a < 1, and f(z) is a rational 
function with no pole on the positive real axis x > 0. In addition, we assume 
f(z) such that f(z)/z°~! — 0 in the limits z — 0 and z = oo. 


Consider integrals of the form 


272 9 Contour Integrals 


To calculate such an integral, we consider the function 


of the complex variable z, defined in the plane with the positive real axis 
x > 0 excluded. Let D be the open set thus defined. It is necessary to specify 
the branch of z* chosen in D, so we take the branch of the argument of z 
between 0 and 27. With this convention, we integrate g(z) along the closed 
path C(r,¢) as follows: we first trace the real axis from ¢ > 0 to r > 0, then 
the circle I(r) of centered at the origin and radius r in the positive sense, 
then the real axis from r to e, and finally, the circle y(€) of center 0 and radius 
€ in the negative sense. The integral 


i ee 


is equal to the sum of the residues of the poles of f(z)/z® contained in D if 
r has been chosen sufficiently large and ¢ sufficiently small. We have 


f(z) f(z) F(z) tray [ £@ 
——dz= dz4 dz4 e€ SEF | 
= ze : i ze c yf A be " 


y(e) 2 


because when the argument of z is equal to 27, 


3 = err g\& 


From assumption, f(z)/z°~! tends to 0 when z tends to 0 or when |z| tends 
to infinity. Thus the integrals along I(r) and y(¢) tend to 0 as r > oo and 
e — 0. On the limit, we have 


(Per?) he FN pe 2mi 5” Res - 


x z 


(9.27) 


This relation allows us to calculate the original integral. 


Example Try to evaluate the integral 


ioe dx 
[= ———. (0 1). 
i x*(1+ 2) OSes) 


Here we have 


1 
f(z) = 5 ea, 
where there is only one pole at z = —1. As the branch of the argument of z is 


equal to 7 at this point, the residue of f(z)/z® at this pole is equal to 1/e**. 


Relation (9.27) then gives 
7 


sin Ta" 


9.2 Applications to Real Integrals 273 
9.2.6 Type 5: Integrals of f(x) log x 


The final type of integral to be noted is a class of the form 


Vie f(x) log xdz, 
0 


where f is a rational function with no pole on the positive real axis x > 0 and 


lim af(x) = 0. 


zr— CoO 


This last condition ensures that the integral is convergent. 

We consider the same open set D as for integrals of Type 4 and the same 
path of integration. Here again, we must specify the branch chosen for log z, 
and we choose the argument of z between 0 and 27. For a reason that will 
soon be apparent, we integrate the function f(z)(log z)? instead of f(z) log z. 
Here again the integrals along the circles I(r) and y(e) tend to 0 as r > co 
and ¢ — 0, respectively. 

When the argument z is equal to 27, we have 


log z = logx + 271. 


Thus we have the relation 
[ox f(x) (log x) year f 4 f(x) (log « + 27i)?dx = 2ni 5” Res [f( z)(log z)?] . 


and, hence, 


-2 [1 f(x) log ada — ani [4 f(a)dx = S| Res [f( z)(log z) ‘| . (9.28) 


By taking the imaginary part of the relation (9.28), we obtain the desired 
result: 
Tm f(x) log ada = — 51m {7 Res [F(2 )(log z) ate 


Example Consider the integral 


~~ logax 
i pee Se: 
i (Q+a)3" 


As the residue of (log z)?/(1 + z)? at the pole z = —1 is equal to 1 — iz, we 
find 


Ras 
2 


274 9 Contour Integrals 


Exercises 


20 
do 
1. Evaluate the integral defined by J = | = acess)? (0<a< 1). 


Solution: Let z =e’? and set C: |z| =1. Then 


4 zdz 
3 : 
wa* Jo [22 + (2z/a) + 1] 


The integrand has two poles of second order at z = 21, Z2 (|z1| < 
|22|), which are the solutions of the equation g(z) = 2* + (2z/a) + 
1 = 0. Since 0 < a < 1, only the pole z; = (—14+ V1-—a?)/a is 
found within C. The residue at z1 is given by 

d 


: z . ad z 
Res(z1) = Jim as c ay | = lim — 


zz dz (Zz — 22)? 


21 + 29 2/a 


(a — #2)? (aVT—@2/a) 


and thus we obtain 


20 


(1 — a2)3/?” ig 


4 
I= a) x 27iRes(z1) = 
ta 


1 z 
2. Evaluate the integral [ = af “a (C: |z| =1) for integer n. 
27mt Jo 2” 


Solution: For integers n < 0, it is apparent that J = 0 since 
the integrand is analytic within and on C. For integers n > 0, 
f(z) = e*z~” has a pole of order n at z = 0. Using the residue 
theorem, we have [=1/(n—1)!. & 
: os dx 
3. Calculate the integral I= ie (4 22)" 
Solution: Define the function f(z) = 1/(1+ 27)"*", and set the 
semicircle C' as shown in Fig. 9.1. Within C, f(z) has the pole of 
(n+ 1)th order at z = 1, and its residue reads 


1 Ee ae 1 2 


n! | dz” (14 22)rtt 


= (-1)"(n+ zi + 2)- hie ial 
— (Qnyh 1 
~~ 22n(n!)2 26° 


Hence, in view of Cauchy’s theorem, we have 


9.2 Applications to Real Integrals 275 
m(2n)! 
We now observe that 


§ seas [ eee ! leresaac (9.30) 


where I” denotes the upper half-circle. Since |1 + z?| > R? —1 on 
C, the second integral in the limit R — oo yields 


dz TR 
< i 3l 
| cae = (R2 — 1) 0 (9.31) 
From (9.29)—(9.31), we conclude that 
_ @(2n)! 
 22n(n!)2" * 


Fig. 9.1. The integration path used in Exercise 3 


4. Calculate the integral I = i log(1 — 2r cos @ + r?)d0, where r # 1. 
0 


Solution: First we assume that r < 1. Observe that the function 


log(1—z)/z = —1—(z/2)—(z?/3)—--- is analytic for |z| <r <1. 
Hence, if we set the circle C : |z| =r, we have 
log(1 — z) my ick 
+ dz =i log(1 — z)d0 = 0. (9.32) 
C od 0 


Since |1—z|? = 1—2r cos 6+r? on C, the real component of the sec- 
ond integral in (9.32) reads (i/2) log (1 — 2r-cos 6 + r?) dO = 0, 
so we get 


276 9 Contour Integrals 


I=0 forr<l. 


Next we consider the case of r > 1. Set s = 1/r < 1 to obtain 
0= | log (1 - 2s cos 6 + s”) dé =} log (1 — -—cosd+ =) dé 
0 0 i he r 
= | [log (1 — 2rcosé+ r?) — log r?| dé. 

0 


Hence, we conclude that 


I=2rlogr forr>1. & 


a-l 


5. Calculate the integral J = i = dx, where0O<a<1l. 
0 


1+2 
Solution: Consider the power function 


7h = eB logz = ef log |z|+éargz) 


with —1 < 6 < 0. Its branch for 0 < argz < 27 is single-valued on 
the domain D enclosed by the contour C = AB+I'+ B’A’ +7 
depicted in Fig. 9.2. Let the radius r of the circle y be sufficiently 
small and that R of I’ be sufficiently large. Then, the pole z = —1 
of the function f(z) = 2°/(1 + z) is located within C' so that we 
have 


frote= [5+ [tow- [SSS + from. 
(9.33) 


Observe that 


[ foe 
Tr 
< 0 (r—0). 


[ fea < 
7 l-r 
Take the limits R — oo and r > 0 on both sides of (9.33) to yield 


|24| In Re+ 
< dz| < 0 
< f paglels 5 (R — 00) 


and 
Qrr bth 


Rn a B ; 
(1 - eb ant) i, “dx = Res | ,—1| =27i lim. 2? = 2rie?'™, 
9 l+2 l+z Zo 


which then gives us 


Since 3 = a —1, the above result is equivalent to 


T 


l=- ‘ 
sin a7 


9.3 More Applications of Residue Calculus 277 


Fig. 9.2. Integration path C= AB+J/'+ B’A’ +7 used in Exercise 5 


9.3 More Applications of Residue Calculus 


9.3.1 Integrals on Rectangular Contours 


The integrals discussed so far are evaluated using the residue theorem based 
on a circular (or semicircular) contour whose radius is eventually made to be 
infinitely large or infinitely small. However, there are other integrals that can 
be evaluated by the residue theorem that do not have to be closed with a 
circle. Several examples are given below. 

Let us consider the integral 


Pe i eee, 
—co (1 + €*) 


To evaluate it, we examine the contour integral 


ze” 


around the rectangular contour shown in Fig. 9.3. Beginning at the lower left 
hand corner of the rectangle, 


L x wT - L+i —L - atin 
L y 
Te i peer +f cee idy + [ a caek La 
0 L 


_p (1+ 22)? Ge eXLtiy))? (1+ eratin))? 


10) —L . —L+iy 
+f pees ~idy. (9.35) 
wT (1 + ee) 


278 9 Contour Integrals 


Fig. 9.3. Rectangular contour surrounding the path z = 7i/2 


In the limit L — oo, the second and fourth integral of (9.35) go to zero, since 
in this limit the magnitude of e?(4+™ and e?(-4+™ become very large or 
very small, respectively, compared to unity. Hence, we have 


[oe] x —oo . atin 
lim J= xe : | ERY: de, 
L—oo ae (1 + 2) = (1 ai er(a+in)) 
fore) . atin fore) Hi 
Lf 3 / eae ge OT in [ —*__ dr, (9.36) 
—0o (1 + e2(e+ir)) —0o (1 + 2") 
where we have used the expressions e*+'7 = —e* and e2(*+'™) = e?”, Asa 


result, the integral J to be evaluated is expressed in terms of J as 


1 in [°° e* 
T=- ii , : 
9 ioe 2 2 pes (1+ gap (9.37) 


The contour integral J is readily evaluated by employing the residue the- 
orem. Looking back to the definition (9.34), we see that J has second-order 


poles at the values of z for which e?* = —1. These values are 
in 307 1 
=+—,+—,:--,+i(N+- 
& 2 > 2 d 9 u ( te 5) T, 


where N is a nonnegative integer. Note that only the pole at z = iz/2 is 
enclosed in the rectangle (see Fig. 9.3). Hence, using the ratio method (see 
Sect. 9.1.4) we have 


in . sp (in/2)  —n(2 + ir) 
=2 — ]=27m1-2 = ; 
J miRes ( 5 ) ug) in) Z , (9.38) 
where p(z) = ze* and q(z) = (1 + e?*)? are constituents of the integrand in 


(9.34). 
The latter integral of (9.37) is evaluated by substituting w = e”, and it 
follows that 


aie e* dw se a dw 
dz = = 9.39 
[. (ite)? | (+ wy? ne (1+ w2)?” ey 


9.3 More Applications of Residue Calculus 279 


Thus, applying the residue theorem yields 


pes 5. 1 d? 1 T 
— ~.mi-Res(i) = ni li =—. (94 
5 fe (+ uP 5 2ni-Res(i) =i lim = 2 CEaE ri (9.40) 


From (9.38) and (9.40), we finally obtain 


9.3.2 Fresnel Integrals 
We would like to derive the equations 


T 


i cos(ke?)de = | sin(ka?)de = 5 7 


with a real positive constant k. These are known as the Fresnel cosine 
integral and Fresnel sine integral. Integrals of this type are encountered 
in the study of a phenomenon called diffraction, which is exhibited by all types 
of waves such as light and sound. 

In this connection we consider the integral 


I -¢ edz (k >0) (9.41) 

C 
around the contour shown in Fig. 9.4. The integral variable z becomes z = x 
on the segment along the real axis, z = Re’? (0 < ¢ < 1/4) along the large 


(ultimately infinite) arc, and z = x(1 +7) along the slanted segment defined 
by y = x. Therefore, with (1 + i)? = 2i, we have 


oe) m/4 
: ape 32 “Lind F ik R2e2?? . 
lim ek dz = e*? dre + lim eRe i Rddb 
Row Jo 0 R—- 00 0 


0 
Eales i) f eT he de, (9.42) 


Our objective is to evaluate the real and imaginary parts of the first integral 
on the right-hand side of (9.42). Then, evaluations of the other integrals shown 
in (9.42) complete the computation. 

First, we readily obtain 


¢ ek? dy = 0, (9.43) 
C 


280 9 Contour Integrals 


Fig. 9.4. Contour for evaluating the integral (9.41) 


since there are no poles within the contour of Fig. 9.4. 
Second, we consider the integral along the arc, which is given in the second 
term on the right-hand side of (9.42). On the large arc, we have 


[ReibRre* 


[Ret cos(2$) ,—kR? sin(2¢)| < po—kR? sin(2¢) 


where the sign of sin(2¢) is always nonnegative in the range 0 < ¢ < 7/4. 
Hence, 
x 
lim Re-*F sin@9) — 9, (9.44) 


R-oo 


so that the integral along the arc vanishes in the limit R — oo. In fact, 
V’Hopital’s rule states that for a > 0, 


li li d 
im — = mM =m _,~D = 
R00 et? R00 QaRer®? 


Finally we examine the integral along the slanted segment, i.e., the third 
term on the right-hand side of (9.42). To evaluate it, we consider the quantity 


2 


ie ([- a?) = i- en 2he? ae. a en 2h? dy 
= I. aw [ dy eo 2k(2* +97) 


In terms of the polar coordinates, it yields 


foe) Qn é foe) 2 QT 4 
J= | dr | dO re~?*"? = ' Le a ae eee 
0 0 0 2 0 2k 


and we have the Gaussian integral given by 


ioe —2kx? T 
dx = ,/ — 
/ Pees TN OR? 


9.3 More Applications of Residue Calculus 281 


0 1 . 
a+a f en the” day = Ve (9.45) 


Substituting the results of (9.43), (9.44), and (9.45) into (9.42), we find that 


| eth dy = x. (9.46) 


Writing the exponential in trigonometric form and equating the real and imag- 
inary parts of both sides of (9.42), we obtain the Fresnel integral: 


Co Co 1 T 
k 2d => 31 k 2) d = —, 
i cos(ka*)dax | sin(ka*)dx BA aE & 


9.3.3 Summation of Series 


so that 


Our final application of the residue theorem is the summation of a series 
_ f(n). Using this method, we can convert a certain type of series to 
simple forms such as 


Co 


3 Gawit - (9.47) 


Pome sin?(ma) 


and 
[o.e) 


S- 28 coth d 
= 2--. 
x? + n2n2 x 


This technique is particularly useful, for instance, to express a power series 
solution of a differential equation in a simple closed form. In fact, this device 
is generalized for various series summations as shown below. 


@ Theorem: 
An infinite series of functions f(n) with respect to an integer n is given 
by 


oe f(n) =o ys Res (9, Gal (9.48) 
where Res (g, an) is the residue of the specific function 


~ tan(7z) 


at the nth pole of f(z) located at z = dy. 


282 9 Contour Integrals 


According to this theorem, we see that if the number of poles of f(z) is finite 
and the values of Res (g,a,,) are readily obtained, the series on the left-hand 
side of (9.48) is written in a simple form. 


Proof The key point is to use a function given by 7/tan(z). This function 
has simple poles at z = 0,+1,+2,---, each with residue 1 evaluated as 


-(g-n)= 


Tv 


lim = 
zn m/ cos? (mz) 


lim ——— 
Bae tan(7z) : 


where we used l’H6pital’s rule (see Exercise 3 in Sect. 8.1). In addition, the 
function 7/tan(mz) is bounded at infinity except on the real axis. To derive 
(9.48), let us consider the contour integral 


¢ AINE) 5, (9.49) 
Cc. 


_ tan(7z) 


around the contour C; shown in Fig. 9.5. Here f(z) is assumed to have no 
branch points or essential singularities anywhere. Since only the pole at z = 0 
is found within C), the contour integral equals 277 times the residue of the 
integrand at z = 0, which is f(0), ie., 


f wf() dz = 2rif (0). 
C. 


_ tan(7z) 


Fig. 9.5. A sequence of rectangular contours to derive equation (9.48) 


Next, the integral around contour C2 is 


£ THe) yy = oni [f(0) + f(1) + f(-1) + Reslg,ar)], 


, tan(7z) 


where Res(g,a,) stems from the contribution of the pole of f(z) located at 
z = ay. Finally, for a contour at infinity, the integral must be 


¢ mf (2) dz = 2ni 3 [f(n) + Res(g, ea} : (9.50) 


,, tan(7z) 


n=—Cco 


9.3 More Applications of Residue Calculus 283 


If |zf(z)| — 0 as |z| — oo, the infinite contour integral is zero so that we 
successfully obtain the equation: 


d] f(r)=— DF Res(g,an). & (9.51) 


n=—Co n=— Co 


9.3.4 Langevin and Riemann zeta Functions 


Our present aim is to establish the equivalence between Langevin’s 
function, 
coth x — (1/2), 


and the sum 
foe) 


ee 
ye? +4 n2 72 


n=1 
Lettin 2) = 2a/(x? + 2277), and using the above equation, we obtain 
g g 


N 
3 Qn a fi root we f(2)de a > Res [x cot(7z) f(z)], 


2 Dige 
av NAT 
m=—N z3 poles 


where C' is a closed contour, say, a rectangle, enclosing the points z = 
0,+1,---. Now let the length and width of the rectangle C approach oo. 
As this happens, 


|dz|} +0. (9.52) 


1 1 2x 
aif teotes (eds < 5 fh tleot me ee 


Hence, we have 


| 


3 Qu tess E cot aa 


2 +4 ze? 


=tia/7r 


TT 


2x [cot(iz)  cot(—ix) 
Qian /m —2ia/n 
= 2icot(ix) = 2cothz. 
This result can be rewritten as 
— Qa Y) 
pe Re yee + i => 2cothz 


or 
1 Qe 
coth x — = 22 ge cg’ (9.53) 


which establishes the result we stated at the outset. 


284 9 Contour Integrals 


Remark. To see that the integral in (9.52) vanishes as z — oo, we observe that 


|cot 7z| = 5 


| cos 72| cos? ra + sinh? ry 
sin? rx + sinh? ry 


|sintz| 
If we choose the rectangle whose vertical sides cross the x-axis at a large 


enough half-integer, say, 2 = 10° + 4 so that cos7a = 0 and sinza = 1, then 
over these sides of the rectangle 


: h2 
| cot rz| = melee = |tanhzy| < 1. 
1+sinh* ry 


Over the horizontal sides of the rectangle, lim,...|cot7z| = 1. Thus the 
integrand goes as |1/2z*| as |z| = oo, and the integral vanishes. 


If we integrate both sides of (9.53) from 0 to x, we get 


oS x a x? sinh x 


m=1 


Hence, 


ee Ty ie x? 
x ct mn? ) - 


We may extend this result to all z in the complex plane by analytic contin- 
uation. Then setting « = 76 with @ real, we obtain 


This infinite product formula displays all the zeros of sin 6 explicitly. It repre- 
sents the complete factorization of the Taylor series and can, in fact, be taken 
as the definition of the sine function. 
By equating coefficients of the 6° term of both sides of the above equation, 
we obtain a useful sum: 
oo 2 
ya 
oa ) 
er 6 


which is a special value of the Riemann zeta function, 


9.4 Argument Principle 285 


Exercises 


1. Evaluate 07... Gin? by considering the contour integral: 


T 1 
c tan(7z) (a+ z) 
where a is not an integer and C is a circle of large radius, 
Solution: In order to use equation (9.48), we define 


1 T 1 


(a+)? ane Oe ana  (a+2)2 


f(z) = 


Since the integrand g(z) has simple poles at z = 0,+1,+2--- and 
a double pole at z = —a, evaluation of Res(g,—a) completes the 
problem [see (9.48)]. To find the residue at z = —a, set z = —a+& 
for small € and determine the coefficient of €~1: 


T 1 T 1 
tan(mz) (a+ z)? = €? tan(—am + é7) 


~ e | ta | eae | “} 


It follows from (9.54) that the residue at the double pole z = —a 
is 


| d 1 | —T ‘iad 
| ———~ =n |—-—~ = 
dztan(mz) |, 4 sin?(7z) |, sin? (za) 
Therefore, it is readily seen from (9.48) that 

Co 1 2 


T 
2 (a+n)? ~ sin? za ? 


n=—Co 


9.4 Argument Principle 


9.4.1 The Principle 


It may occur that a function f(z) has several zeros and poles simulateously 
in a domain D. If we denote the number of such zeros and poles by No and 
Noo, respectively, these numbers are related to one another as stated below. 


286 9 Contour Integrals 


@ Argument principle: 
Let f(z) be an analytic function within a closed contour C' except at a 
finite number of poles. If f(z) 4 0 on C, then 


eo 
Qri Io TEN 


where No and N,, are the numbers of zeros and poles of f(z) in C, respec- 
tively. Both zeros and poles are to be counted with their multiplicities. 


z = No — Noo (9.54) 


Proof By the residue theorem, the integral 


oh of Fi) 
Qri Io f(z) 


is equal to the sum of the residues of the logarithmic derivative of f(z) in D, 


ie., 
Fiz dllog f(z 
ge) = £2. - Woe F@), 
F(z) dz 
The only possible singularities of g(z) in D coincide with the zeros and poles 
of f(z). In order to determine the residue of g(z) at a zero of f(z), we observe 
that in the neighborhood of a zero a of the nth order, f(z) has an expansion 


f(2) =(2-a)"[atelz—a)t-], a £0. 


We therefore have 
f(z) = (2-4)"filz), 


where fi(z) #0 in a certain neighborhood of z = a. Hence, 


log f(z) = nlog(z — a) + log fi(z), 


and 

f@)_ | AG) 

f(z) z-a@ fiz)’ 
where the last term is analytic at z = a. It follows that the residue of g(z), 
which is called the logarithmic residue of f(z) at z = ais n, i.e., it is equal 
to the order of the zero of f(z) at z =a. If the zeros of f(z) in D are counted 
with their multiplicities, the sum of the logarithmic residues of f(z) at the 
zeros of f(z) in D will be equal to the number of zeros. 

We now turn to the poles of f(z) in D. If z = b is a pole of order m, we 

have near it an expansion 


9.4 Argument Principle 287 


i Oa tg oe 
1 
= Ga [cr + co(z —b) +--+: ] 
= f2(z) 
=o) 


where f2(z) is analytic at z = b and fo(z) #0. Hence, 
fi2)_ om, Ble) 


f(z) z—b  fe(z)’ 
which shows that the logarithmic residue of f(z) at a pole of f(z) of order m 
is —m. If the poles of f(z) in D are counted with their multiplicities, the sum 
of the logarithmic residues of f(z) at the points of f(z) in D will be equal to 
minus the number of these poles. Since g(z) has no singularities in D except 
at the zeros and poles of f(z), we have proven our theorem. & 


Remark. If we replace f(z) in (9.54) by f(z) — a, this formula will yield the 
difference between the number of zeros and the poles of f(z) — a. Since the 
latter are identical with the poles of f(z), we find that 


Be PO senate 
mf Foca are 


where N, indicates how often the value of a is taken by f(z) in D. 


Examples 1. For f(z) = z? and C: |z|=1, No =2 and N, =0 so that we 


have 
a Lor 
ani Jo f(z) 
In fact, the integral reads 


/ 
= Cl ew = ys eee oa 
2ni Io f(z) 2ni Jo 2? Qri 
2. For f(z) = z/(z—a) and C: |z| = R, No =1 and 


1 ifR>a, 
Nels if R <a. 


vi — an 


Hence, we have 


Qri Jo f(z) 1 ifR<a. 


Indeed, f(z) = 1+ [a/(z—a)], f(z) = -1/(z— 4), f'/f = (1/z) — [1/ 
(z —a)], which yields (9.55). 


288 9 Contour Integrals 
9.4.2 Variation of the Argument 


Equation (9.54) can be brought into a different form in which its geometric 
character becomes more apparent. If we write 


y=argf(z), f(z) =|fele®, 


Lf fe, 1 : 
sah f(z) ee sai f tows ) 


= ai flee f(a) + idy] 


1 1 
= — op dl — p dy. 
sip tefl + 5— f dy 


Recall that log w(z) is a many-valued function of w. If logw is continued 
along a closed curve that surrounds to origin, we shall not return to the value 
of log w with which we started. However, this many-valuedness is confined 
to Im(logw) = argw, i.e., Re(logw) = log|w| is single-valued. If we write 
w = f(z), it follows that 


we obtain 


f dlog f(2)| = 0 
C 
In fact, 


/ * dlog|f(2)| = log |f(z2)| — log |f(21)], 


21 


and if the integration is performed over a closed contour, the terminals 2; 
and z2 of the integration coincide; moreover, owing the single-valuedness of 
log | f(z)|, the value of the integral is zero. Hence, we have 


i a ik een 
ini fy Flay a vee 
where y = arg f(z). 


To interpret (9.56), we observe that 


| ee eS Cee ric 


21 


is the quantitiative change in the argument of f(z), which is called the 
variation of the argument of f(z). The integral ¢.,dy is therefore the 
total variation of arg f(z) if z describes the entire boundary C of the domain 


9.4 Argument Principle 289 


D. Tt is clear that the value of this integral must be an integral multiple of 
2r. If z describes C, the point f(z) describes a closed curve C’, and if C’ 
surrounds the origin m times in the positive (counterclockwise) direction, the 
increase in argf(z) along C’ will be 2mz. In view of (9.54) and (9.56), we 
obtain the theorem below. 


@ Theorem: 

Let the domain D be bounded by one or more closed contours C and let 
a function f(z) be single-valued and analytic apart from a finite number 
of poles. If No and N,, denote the number of zeros and poles of f(z) in D, 
respectively, and f(z) £0 on C, then 


1 
—A,. = No- No, 
20 


where A, denotes the total variation of argf(z). 


9.4.3 Extentson of the Argument Principle 


The argument principle can be extended to the case in which f(z) has zeros 
or poles on the boundary C of the domain D. Suppose that f(z) = 0, where 
zo is situated on C. Let f(z) be analytic at zo; then we have 


f(z) = (2-20) filz), filzo) 4 9, 
if m is the multiplicity of the zero. In view of the relation 

log f(z) = mlog(z — zo) + log fi(z), 
it follows that 


arg f(z) = marg(z — 20) + arg fi(z). 

At z = 2, fi(z) 4 0 and log f(z) is analytic. Hence, argf(z) will vary 
continuously if z varies along C' and passes through z = zo, but the expression 
arg(z — 29) shows a different behavior. Since this is the angle between the 
parallel to the positive axis through z and the linear segment drawn from Zo 
to z, it is clear that if zo is passed arg(z — zo) jumps by the amount 7. The 
contribution of this zero to arg f(z) will be mz, i.e., one-half of what it would 
have been if the zero were situated in the interior of D. If z = zo is a pole of 
order m, its contribution to arg f(z) will be —mz. This follows immediately 
from the fact that f(z)~+ has a zero of order m at zo and that 


log[f(z)~*] = — log f(z). 


290 9 Contour Integrals 


We therefore have the following extension of the argument principle. 


@ Extended argument principle: 

The argument principle remains valid if f(z) has poles and zeros on the 
boundary, provided that these poles and zeros are counted with half their 
multiplicities. 


9.4.4 Rouché Theorem 


As an application of the argument principle, we prove the following result, 
known as the Rouché theorem. 


@ Rouché theorem: 

If the function f(z) and g(z) are analytic and single-valued in a domain 
D and on its boundary C and if |g(z)| < |f(z)| on C, then the number of 
zeros of the function f(z) + g(z) within D is equal to that of zeros of f(z). 


Proof We have 


log [fG2) +92) = oe fe) +108 ft + 43] 
whence 
arg [f(z) + g(z)| = arg f(z) + arg E + ] ; (9.57) 
On the contour C’, we have 
g(z) 
Fale 
It thus follows that the points 
oye)! « 
walt Fy z€C (9.58) 


are all situated in the interior of the circle |1—w] < 1. Since this circle does not 
contain the origin, the curve (9.58) cannot surround that point. As a result, 
the total variation of the argument of (9.58) along C’ is zero. Hence, by (9.57), 
we have 


Ac | f(z) + g(2)] = Ac [f(2)]. 


Since neither f(z) nor f(z) + g(z) has poles in D, it follows from (9.54) that 
these two functions have the same number of zeros in D. & 


9.4 Argument Principle 291 


The application of Rouché’s theorem is illustrated by the following short 
proof of the maximum principle. If f(z) is analytic in D+ C and there is a 
point z) in D such that 


If(z)| <|f(20)| for 2 € C, 


then it follows from Rouché’s theorem that the function f(zo)— f(z) and f(zo) 
have the same number of zeros in D and the function f(z) — f(z) has at least 
one zero there, namely, at z = zo. The assumption that |f(z)| < |f(zo0)| for 
z € C thus leads to a contradiction. 


Exercises 


1. Let z; be the zeros of a function f(z) that is analytic in a circular domain 
D and let f(z) 4 0. Each zero is counted as many times as its multiplicity. 
Prove that for every closed curve C' in D that does not pass through a 
zero, the sum of winding numbers yields 


J" n(C, 4) = mf, 7 dz. (9.59) 


J 


Solution: From hypothesis, we can write f(z) = (z¢ — z1)(z - 
22)+++(z — 2n)g(z), where g(z) is analytic and g(z) 4 0 in D. 
Forming the logarithmic derivative, we obtain 

F(z) 1 LS os 1 g'(z) 


f(z) 2-24 2-22 eer sane | 


for z # z;, and particularly on C. Since g(z) 4 0 in D, Cauchy’s 
theorem yields $., 9'(z)/g(z)dz = 0. Recalling the definition of 
n(C, z;), we set the desired result (9.59). de 
2. Show that an analytic function in a domain D that takes only real values 
on the boundary C of D reduces to a constant. 

Solution: Let € = a+ ib, b 4 0, be a nonreal complex number 
and consider the values of f(z) — € for z € C. If b > 0, say, we 
have Im[f(z) — €] = 6 > 0 since f(z) is real on C. The vales of 
f(z) — € are thus confined to the upper half-plane so that the 
curve described by f(z) — € cannot surround the origin. Hence, we 
have Ac[f(z) — €] = 0. Furthermore, since f(z) — € is analytic in 
D+ C, it follows from the argument principle that f(z) — € 4 0, 
ie., f(z) 4 € in D. The same reasoning also applies to values € for 
which b < 0. We thus conclude that f(z) does not take nonreal 
values in D. 


292 9 Contour Integrals 


Next we show that the above result means that f(z) reduces to a 
constant. Since f(z) is analytic in D, we have 


(Si DEED IO) of, Se STE) 


h—0 h h0 ih 


? 


where h — 0 through positive values. Since f(z) is real throughout 
D, the first limit is real and the second limit is imaginary. They can 
therefore be equal only if they are both zero. Since z is arbitrary, 
it follows that f’(z) = 0 throughout D; hence, f(z) = const. d& 


3. Show that all zeros of polynomials 
p(z) = 2" tagiiz” +e baz + ap 
are located within the region |z| < Ro, where 
Ro = max {1 + |an—1|,1+ |@n—al,:-+ ,1+ [a1], |aol}. 


Solution: Let f(z) = 2", g(z) = an_12" 1 +--+ a12 +49, and 
let Ry = Ro + (1/k) for an arbitrary fixed k € N. Observe that 


|a;| < Ro —1< Ry —1 for j =1,2,-+- ,n—-1 
and |ao| < Ro < Ry. Then, if |z| = Ry, we have 


|9(z)| < lan-allz|""* + +++ + ar|l2| + laol 
< (Ry — IRE 1 +--+ + (Re —1)Re + Re = RE = | f (2). 


In view of Rouche’s theorem, f(z) and f(z) + g(z) = p(z) have 
the same number of zeros within the region |z| < Rx. Since f(z) 
has n zeros and p(z) is an nth-order polynomial, we conclude that 
all the zeros of p(z) have to be located within the region |z| < Rp. 
Finally, we take the limit k — oo (since k is arbitrary) to find that 
all the zeros of p(z) have to be located within |z|< Rp. & 
4. Show that the equation z?+3z+1 = 0 has solutions whose absolute values 
are less than 2. 


Solution: Let z be on the circle |z| = 2. Then we have 
|z3| =8 >3-2+1> 3]z|+1> |[3z4+ 1]. 


This means that there are three solutions to the equation 2? + 
3z + 1=0 and that all of them satisfy |z| <2. & 


9.5 Dispersion Relations 293 


9.5 Dispersion Relations 


9.5.1 Principal Value Integrals 


The previous sections treated contour integrals whose integrand has no pole 
on the contour C’. If a pole is located on C, the integrand diverges at the 
pole so that we cannot use ordinary integration methods. This difficulty is 
overcome by introducing a new concept called the principal value integral. 
To derive it, we consider an integral 


T= I 4, (9.60) 
cz-a 

with the integration contour depicted in Fig. 9.6. In (9.60), a is assumed to 
be real without loss of generality. In addition, we assume that f(z) is analytic 
at Imz > 0, and behaves as z°|f(z)| = A (@ > 0) as |z| — oo there. In order 
for the integral (9.60) to be defined, the contour C' must be traversed in such 
a way as to avoid the pole at z = a. Then, since both f(z) and 1/(z— a) are 
analytic within and on C, (9.60) equals zero. Therefore, by breaking it up, we 
obtain the following expression: 


(2) 


Gra 


-[ Lg | [Beef Pas [Pa 
=0. (9.61) 


Here r is the radius of the small semicircle y centered at + = a and R is 
the radius of the large semicircle I’ centered at the origin. The radius r 
can be chosen as small as we please and R can be chosen as large as we 
please. 

Our current interest is to determine where the sum of the four integrals 
appearing in the second line of (9.61) converges in the limits of r — 0 and 
R — o. This is seen by evaluating the integrals along y and I" given in 
(9.61). First, once we set z = Re’, the integral along the large semicircle I” 
yields 


Fig. 9.6. Integration contour on which the pole of the integrand is located 


294 9 Contour Integrals 


7 i0 
pee 9 Re’ —a 
hence, 
f(z) R - id 
dz| < 7 0 62 
eat | oe, | f(Re”’)| dd, (9.62) 


where we have used the inequality 


|Re® — al = VR? + a2 — 2Racosd > VR? + a2 — 2Ra = |R— al. 


In the limit R — oo, the right-hand side of (9.62) vanishes since 3 > 0. 
Therefore, the integral over the semicircle I’ can be made arbitrarily small by 
choosing R sufficiently large. 

Next, we write the integral along y as 


£0 a= Ho) [| oat [ERO y, (9.63) 


pe a zZ-a 
By setting z — a = re’®, the first integral on the right-hand side is evaluated 


as 
0 


f(a) i p42 = isle) [ao = -ins(a). 


In addition, the Taylor series expansion of f(z) around z = a yields 


f(2) = Fl) 5 = f’(a)- ice? dd + Pa), ce? - ice? dO +--+» = O(e), 


Z—-a 2 


which means that the second integral in (9.63) vanishes in the limit r — 0. 
Equation (9.61) thus yields 


lim lim IO) gig i 12 ay —inf(a) =0. (9.64) 


R-c r-0 ~R de OL tr FQ 


Now we introduce a new notation as shown below. 


@ Principal value integral: 
The notation 


- SON eo 
pi Hale am | Lary f" Phas 


provides the principal value integral (or the Cauchy principal value) 
of f(z)/(z — a) for real a. 


9.5 Dispersion Relations 295 


with this notation, (9.64) reads 


lim P 


R-0oo -Rt—- a 


where f(x) is a complex-valued function of a real variable x. For the sake of 
brevity, we write this simply as 


P i} CE ee ey (9.65) 

9 £-a 
This result provides a way to evaluate the contour integrals involving sin- 
gularities on the integration path. When we decompose f(a) in (9.65) as 
f(x) = fr(x)+fr(x) and equate the real and imaginary parts, we obtain an 
important relation between fr and fr: 


@ Hilbert transform pair: 
A pair of functions fr and f; that satisfies the relations 


PaCS 15 eS aiTAae) ae, 
T Jong 2—O 
f(a) = <p f i@)a,. (9.66) 
tT J_».»@-a 


is called a Hilbert transform pair. 
It readily follows from (9.66) that if fr(#) =0, then fr(x) = 0. 


9.5.2 Several Remarks 


The principal value integral is seen as a way of avoiding singularities on a 
path of integration; we integration to the point just before the singularity in 
question, skip over the singularity, and begin integrating again immediately 
beyond the singularity. This prescription enables us to make sense out of 


integrals such as 
R 

dx 
i Ge (9.67) 

—-R wv 
Apparently, this integral seems to be zero, since an odd function is integrated 
over a Symmetric domain. However, the singularity at the origin makes the 
integral meaningless unless we insert a symbol P in front of it. Following the 
prescription for principal value integrals, we can easily evaluate the principal 


value of (9.67): 
R =f R 
pf B= im [es [2], 
ge S0lfop we Jae 


296 9 Contour Integrals 
In the first integral on the right-hand side, we set « = —y. Then 
a F de 
feef'4] 
R Y r x 


where the two integrals within the brackets obviously cancel out. Conse- 


quently, we have 
R 
d. 
P i eau ie (9.68) 
—-R wv 


We emphasize again that the integral (9.68) is completely different from the 
meaningless quantity in (9.67). 
As a further step, we evaluate the principal value integral defined by 


pf an 


It follows from the result of (9.83) that 


eerie 


= f(a) In ( =) Pf ME) LO) py, (9.69) 


It often happens that the second integral in the second equation in (9.69) is not 
be singular at x = a; for instance, as in the case where f(x) is differentiable 
at x = a. In this case, the symbol P there can be dropped 

Particularly interesting is the behavior of (9.69) in the limit R — oo, which 


yields 
pf memes He) ~ LO) iy (9.70) 


Hence, substituting (9.70) into ~ 65), we obtain 
=P ft fix (2) 
w&—-ama ma 


=P ft Pale =n ay dx, (9.71) 


which are complementary expressions of a Hilbert transform pair (9.66). 


Remark. Equation (9.70) is equivalent to 


pf I ag 0, and thus pf = =0, 
L-Ga r—a 


which readily follows from the result (9.68). 


9.5 Dispersion Relations 297 


9.5.3 Dispersion relations 


Mathematical arguments given so far are interesting in their own right, but 
their applications to physical sciences are also significant. In the following 
discussions, we show that general physical quantities associated with response 
phenomena satisfy the Hilbert transform relations given in (9.66) and (9.71). 
In the language of physics, the relation between corresponding parts of Hilbert 
transform pairs referred to as a dispersion relation, plays an important role 
in describing the properties of response functions. 

We begin by considering a physical system for which an input I(t) is related 
to a response R(t) in the following linear manner: 


R(t) = = i ” Gt) I(#)at’. (9.72) 


For example, I(t’) might be the electric field acting on a physical object at a 
time t’ and R(t) is the resulting polarization field at time t. We have assumed 
that G depends only on the difference t — t’/ because we want the system to 
respond to a sharp input at to as expressed by I(t’) = Id(t’/ — to). In the same 
way, it would respond to a sharp input at to +7, i.e., at a time 7 later. For 
the first case, we have 


{ef 1 
R,(t) = —= G(t — t')Ip6(t’ — to) dt’ = Gt — to). 9.73 
1) = pe [Gt ey fo8(e ~ toa’ = Geta). (9.73 
and for the second, 


tT as I 
Ro(t) = ral G(t — t')Ipd(t’ — to — r)dt! = Fagot to 7), 


or, in other words, 
Ro(t+7)= Jo G(t — to) = Ri(t) 
T) = — —to)= ; 
: og 


Thus if we shift the input by 7, the response is also shifted by rT. 

Now, in order to derive the dispersion relation for the physical systems of 
interest, we consider the Fourier transform of (9.72). Using the convolution 
theorem, we find that 


where 


1 = wt aa ik. "3 dwt 
r(w) = il. Rit)e'dt,  g(w) = Fe G(t)e''dt, 


1 fae 
and j(w) = == | I(t)e**dt. 


Notably, it is possible to extend g(w) into the complex z-plane, based on 
the assumptions that 


298 9 Contour Integrals 


(i) g(z) is analytic for Imz > 0, and 
(ii) g(z) ~0as z> ~@w. 


Observe that (i) and (ii), are the conditions under which we derived the 
Hilbert transform pair (see Sect. 9.4.1). After some discussion, we see that 
g(z) arising from a G(t) that satisfies the necessary assumptions yields 


1 (oe) / 
gr(w) = -p [ GR ay, (9.74) 


These relations between gr and g; are called the dispersion relations for 
g. The validity of assumptions (i) and (ii) that the function g(z) must satisfy 
is demonstrated in Sect. 9.5.6. 


9.5.4 Kramers—Kronig Relations 


The term “dispersion relation” is often restricted to mean a relation between 
two functions whose arguments are quantitatively treatable experimentally. 
For instance, in (9.74) only a positive frequency (w > 0) should actually be 
accessible, so they are not directly practical as they stand. In the following, we 
derive an alternative expression of the dispersion relations that involve only 
positive, experimentally meaningful frequencies. 

We first assume that G(t) is real, which is obvious from (9.73), where Ry 
and Ip are real. Hence, we may proceed as follows: 


1 ad ; 
(2) = se f Gneae 
0 


(2) = = | "Ge tat = = i Ge tat 
“aoe (9.75) 
As a consequence, we have 
g" (2) = g(-2"), 


which is referred to as the reality condition. 
Next let us assume z to be real (z = w) in order to discuss the behavior 
of g(z) on the real axis. It follows from the reality condition (9.75) that 


gr(w) — ig1(w) = gr(—w) + tgr(—w) 


or 
gr(w) =gr(—w) and gr(w) = —gr(-w). (9.76) 


9.5 Dispersion Relations 299 


That is, gr and g; are even and odd functions of w, respectively. Note that if 
the conditions in (9.76) are satisfied, the function 


G(t) 


we" duy 


~ al 


becomes a real function. (The proof is left to the reader). 
Now we rewrite the first part of (9.74) as 


wale f ad + op fo 
w! —wW ae 


we rewrite w’ — —w’ in the first integral and use (9.76) to obtain 


9 [o.e) / / 
Galo) =P ih oy (9.77) 
0 


and an identical procedure yields 


S --=p [" ae ag (9.78) 


— 


Eventually, the expressions (9.77) and (9.78) involve only positive, experimen- 
tally accessible frequencies. These equations are referred to as the Kramers— 
Kronig relations. 


9.5.5 Subtracted Dispersion Relation 


In deriving dispersion relations, it often happens that the quantity of interest, 
say g(z), does not tend toward zero as |z| — oo. Furthermore, we are not 
usually fortunate enough to know the precise behavior of the quantity as 
|z| tends to infinity. Nevertheless, if we at least know that the quantity is 
bounded for large values of |z|, the dispersion relation can be reformulated in 
the following way: 

Suppose that f(z) is analytic in the upper half-plane, and let ag be some 
point on the real axis at which f(z) is analytic. Our aim is to derive the 
dispersion relation for f(a) under the condition that the asymptotic behavior 
of f(z) for z — oo is unknown. Then, instead of f(z), we consider the function 


f(2) = f(a) 


Zz — QO 


which is also analytic in the upper half-plane and not singular at z = ag, and 
|\p(z)| = 0 as |z| — 00 owing to the boundedness of | f(z)| for z — oo. Thus 
in a manner, similar to the case in (9.65), we can write 


reat) =p [7 


300 9 Contour Integrals 


In actuality, we have 


wx [A= flea] 


=p f Le) foo) ay 
=P f wae cal as (= —a a a) a 


inf(x) = inf(ao)+(x@— a9 pf Ha’) dx! 


(a! — x)(a’ — ao) 


d dx’ 
_ flao)P f “ + f(ao)P f —— 


The last two principal value integrals are equal to zero as we demonstrate later 
n (9.83). Hence, separating the real and imaginary parts, we finally obtain 


ioe abe Ft) gt 


co (a! pe — Q) 


7 G0 R(2’) ! 
fila i fr(ao) — “=p f (2! _ x) \(a! _ a0) dx. (9.79) 
Relations of the type of (9.79) are referred to as once-subtracted disper- 
sion relations. Emphasis is placed on the fact that the relations (9.79) are 
free from the assumption that |f(z)| should vanish in the limit z — oo. For 
them to be of use in a particular physical problem, we must have a means of 
determining, say, fr(ao) for some ao. 


9.5.6 Derivation of Dispersion Relations 


This subsection provides a proof of the dispersion relation (9.74). We shall see 
that by making a few very reasonable assumptions about the system in ques- 
tion, we can show that the real and imaginary parts of the physical quantity 
g(w) are intimately related to one another for real values of w (i.e., a disper- 
sion relation). The key assumption is the causality requirement: we may 
say that causality of the function G(t) implies the analytic properties of g(z) 
in the upper half-plane and thus verifies the dispersion relations with respect 
to g(w) on the real axis. 

Toward this end, let us consider what can be said about G(r) on general 
physical grounds. First to be noted is that an input at t should not give rise 
to a response at times prior to t, i.e., G(7) = 0 for rT < 0. Thus we have 


o-[ G(t — t’)I(t’) dt’, (9.80) 


9.5 Dispersion Relations 301 


which shows that the response at t is the weighted linear superposition of all 
inputs prior to t, which is the causality requirement. 

Secondly, the possibility that G(7) is singular for any finite 7 is excluded 
because, on physical grounds, the response from a sharp input given by 

R=“ eth). 254 
(t)= Jor ( 0); > to 
must always be finite. 

Finally, it is assumed that the effect of an input in the remote past does 
not appreciably influence the present. This may be stated as the requirement 
that G(r) — 0 as rT > oo, since the response to any impulse dies down after 
a sufficiently long time (i.e., any system has some dissipative mechanism). 
Furthermore, G(r) should vanish faster than T~! so that it becomes integrable. 
Recall that g(z) is defined through an integration of G(t) with respect to t. 

The following three points summarise our physically motivated assump- 
tions on G(r): 

(i) G(r) =0 fort <0, 
(ii) G(r) is bounded for all 7, and 
(iii) |G(r)| is integrable, so G(r) — 0 faster than 1/7 as T — ov. 

We demonstrate below that these three assumptions for G(t) lead naturally 
to the two conditions for g(z) under which we have derived the dispersion 
relation of g(w). 

First, we show that these three conditions require that |g(z)| > 0 at z — oo 
on the upper half-plane. It is possible to write 


1 - dwt 
g(w) = Sok G(t)e*dt. 


We extend this relation into the complex plane by using the definition 


1 fe, i Hee 
gz) = 7a G(t)e’* dt = =| G(t)ete—"* dt, 


where we have written z = w+ in. We now restrict our attention to the 
upper half-plane (7 > 0), where the term e~”* is a decaying exponential. For 
0<0< 7, it reads 


1 oo 
\g(z)| < oem | geet. 
0 


where we have replaced G(t) by its maximum value M in view of assumption 
(ii) above. Hence, we have 
Me 
2) 
Is V27|z| sind 
This means that for 0 < @ < 7, |g(z)| — 0 as |z| — oo. On the other hand, 
when @ = 0 or 7, we have 


302 9 Contour Integrals 


1 ih ; 
g(w,n=0)= aa G(t)e'*dt. 


This results in Parseval’s identity: 


Co 1 Co 
f lowen=oyPde= = [lala 


where both sides of improper integrals converge. (See Sect. 3.4.2 for the con- 
vergence conditions of an improper integral.) Thus |g(w,7 = 0)| vanishes as 
w— oo. As a result, |g(z)| — 0 as |z| — oo in the whole region of 0 <0 <7, 
i.e., in any direction in the upper half-plane. 

Now we want to show that g(z) is analytic in the upper half-plane. Using 


1 rzt 2 7 evte—nt 
(2) = se f G(t)e a= | G(t) dt, (9.81) 


we see that for 7 > 0, 

d"g 1 | ss Gees i” 

— = G(t) edt = 

dz” -/2n Jo ( ip V20 
The integrals in (9.82) are uniformly convergent owing to the term e~"* (7 > 0, 
t > 0). Thus g(z) is analytic in the upper half-plane (7 > 0). Hence, for any 
g(z) arising from a G(t) that satisfies assumptions (i), (ii), and (iii), we can 
proceed according to the argument in Sect. 9.4.1, and we finally obtain the 
dispersion relation (9.74). 


i t”G(te™* eT dt. (9.82) 
0 


Exercises 


1. Prove that 


when —R<a<R. 
Solution: We write 


R a-—e R 
d. 
P| = =| f dex ef e. 
_p@&-a e0|J_p £-E eta t—G 
Setting « = —y in the first integral on the right-hand side, we find 
that 


Ee—a 
m | dy +In(R— a) ~Ine| 


R yta 
= lim [Ine — In(R+ a) + In(R — a) — Ine] 


Zin (F*) (-—R<a<R). & (9.83) 


2. 


3. 


9.5 Dispersion Relations 


By using the formula (9.71), prove that 


Beit 
sin x 
/ dx = 7. 
cas. ae 


Solution: Consider the function f(z) = e’*. This function is an- 
alytic everywhere, and if we write z = Re’’, then |f(z)| — 0 as 
R-—- o for all 0 such that 0 < 6 < m. In this case, fr(x) = cosa 
and f;(a) = sina, so using (9.71), we obtain 


Co 


cosa = (1/n) f (sin x — sina) /(x — a)dx. 


—oco 


Since sin sin a = 2sin[(a—a)/2] cos[(a+a) /2], there is no singu- 
larity of the integrand at x = a. For the special case a = 0, we find 
that 1 = (1/m) f° (sina/a)dz, ie, f° (sinz/x)de = 7. From 


this result, we also obtain ee (sin a/a)dx = 7/2 by symmetry. & 


1 ec) xt 
Show that the integral S(t) = — lim i dr reads 


2nt 20 J_,, E— WE 
1, t>0, 
sio= {9 t<0. 


Solution: Taking the contours Im(z) > 0 for t > 0, and Im(z) < 0 
for t < 0, we have the desired result, which is the integral repre- 
sentation of Heaviside’s step function. & 


303 


10 


Conformal Mapping 


Abstract Conformal mapping refers to transformation from one complex plane 
to another such that the local angles and shapes of infinitesimally small figures 
are preserved. This special class of mapping is indispensable for solving physics 
and engineering problems that are expressed in terms of complex functions with 
inconvenient geometries. In this chapter we show that a problem can be drastically 
simplified by choosing an appropriate mapping, which allows us to evaluate the 
solution using elementary calculus. 


10.1 Fundamentals 


10.1.1 Conformal Property of Analytic Functions 


We are concerned here the mapping properties of an analytic function 
w = f(z) ina domain D on the z-plane into the w-plane. Through the map- 
ping, any line drawn on the z-plane results in a line on the w-plane. Partic- 
ularly when f = u+ iv is analytic, the transformation is angle-preserving 
or conformal. This means that through the transformation from (2,y) to 
(u,v), the angle between the crossing lines on the w-plane is equal to the angle 
between the crossing lines on the z-plane (see Fig. 10.1). In physics and en- 
gineering, the subject derives its usefulness from the possibility of transform- 
ing a problem that occurs naturally in a rather difficult setting into another 
simpler one. 

Let D be a domain on the z-plane, and let I, and I be two differentiable 
arcs lying in D and intersecting at a point z = a in D. If f(z) is an analytic 
function in D, the images f(I) and f(I2) are differentiable arcs lying in a 
domain D’ = f(D) and intersecting at a point a’ = f(a). Then we say the 
following: 


306 10 Conformal Mapping 


0 x 


Fig. 10.1. Angle-preserving property of a conformal mapping w = f(z) 


@ Conformal mapping: 

The mapping w = f(z) is conformal at z = a if for every such pair of 
arcs, the angle between the arcs I and I> intersecting at z = a on the 
z-plane is equal to the angle between the arcs f(I,) and f(I2) at their 
intersecting point f(a) on the w-plane. 


The mapping is said to be conformal in D if it is conformal at each point in D. 
We shall see that if a function w = f(z) is analytic, it is necessarily conformal 
except at a finite number of specific points; this fact is formally stated below. 


@ Theorem: 
Given an analytic function f(z), the mapping w = f(z) is conformal at 
z =a if and only if f’(a) £0. 


Proof For proving sufficiency, we consider the arcs I and I) given paramet- 
rically by 
ZL = W(t) and 22 = W(t) (0 < t < 1) 


and assume that z,, 22 are points on I, I> at a short distance @ from z = a. 
Then, from the relation 
z1-a=le™, z-—a= le, 
we have the ratio 
227 @ _ .i(B—-a) 
—— =e : 
Zz, —a 


As €— 0, 8—a must approach the angle 0 between the curves on the z-plane. 


That is, 
é = lim ang (2 — =) : 
£20, zy —- a 


10.1 Fundamentals 307 


For the angle @ between the arcs of f(I,) and f(I») at f(a), we have 


j= pang [MILO 
(22) — fla) (z2 — a) 
~ 0 FEO ay 
f' (a): (42 —a) : / 
fs = lim ang ovens] =0, if f(a) £0. (10.1) 


Thus, the condition f’(a) # 0 is necessary. Conversely if f(a) = 0 with 
n=1,2,--- and f(a) #0, near z = a we have 


f(z) = f(a) + O[(z — @)?]. 
Thus, we get 
f(z2) — F(a) 


= pes rey 7a 


f(a) 
= lim arg a | 


L0 (z —a)P 


. zQ2—- a 
pin ars ( ) po, 


Zz, —a 


which shows that the angle is magnified by p. Therefore, if the mapping 
w = f(z) is conformal, we necessarily have p = 1, which completes the proof 
of the sufficiency of the condition. & 


10.1.2 Scale Factor 


There is another important geometric property that analytic functions pos- 
sess: whenever f(z) is analytic, any infinitesimal figure plotted on the z-plane 
is transformed into a similar figure on the w-plane with a change in size but 
with the proportions (and angles) preserved. We prove this by considering the 
length of an infinitesimally small quantity df given by 


. Ou Ou _. (Ov | Ov 
df =du+idv = (Fear + sed) +4 ($= dx sedy). (10.2) 


308 10 Conformal Mapping 


Its square length reads 


Ou Ou : Ov Ov 7 
d 2 = — —. —— ee 
\df | (Gear ret) + (Sear at) 
du\? dv\? 2 du\? av\? 2 
~ (32) *(ae) Jom |Cay) + GG) | 
Oudu — Ov Ov 
+2 5 ‘ 
(> By Oe >| dxdy (10.3) 


Substituting the Cauchy—Riemann relations into (10.3), we obtain 


du\? du\? dv\? dv\? 
df |? = h?\dz|? here h= — —)= — —). 
| if | | Z| ) where Ox si Oy Ox oF Oy 
(10.4) 
The quantity h is known as a scale factor and measures a magnification ratio 


of the elementary lines through the transformation w = f(z). From (10.4), it 
readily follows that 


df 
dz 
We see from (10.5) that since df/dz is isotropic, the scale factor h is also 
isotropic (i.e., independent of the direction of dz) for any analytic function f. 
This means that any infinitesimal figures on the z-plane are transformed into 
similar figures on the w-plane with a change in their size by h = |df /dz|. 

Note that the magnitude of h depends on points z and may vanish at 
points where f’(z) = 0. Points where f’(z) = 0 are called critical points of 
the transformation w = f(z), and at these points, the transform becomes non 
conformal. The simplest example is 


h=|—]. (10.5) 


for which we have 
h=|f'(0)| =0. 


In fact, when two line elements passing through z = 0 make an angle G—a with 
respect to one another, the corresponding lines on the w-plane make an angle 
of 2(3 — a). Thus mapping is not conformal at z = 0. In general, the region 
in the neighborhood of the point at which h = 0 on the w-plane becomes 
greatly compressed. In contrast, the corresponding region on the z-plane is 
tremendously expanded. 


10.1.3 Mapping of a Differential Area 


The scale factor h given in (10.4) can be derived in a different way by consid- 
ering the conformal mapping of a differential area. Let f(z) be a conformal 
mapping that transforms any points in D of the z-plane onto a region S' of the 


10.1 Fundamentals 309 


w-plane. In the domain D, we define a rectangular differential area element 
with sides of the rectangle parallel to the x and y-axes. These sides are given by 


dz, = dx and dz2 = idy, 
The images of dz; and dzg are differential curves in the w-plane given by 
dw, = du, +idv, and dw = dug + idvo. 


Note that the differential area element of the rectangle in the z-plane reads 
dA, = dxdy and that of the parallelogram in the w-plane is 


dAw = |Im(dwjdwa)|. 


Since dz; = dx and dzg = idy, the images of these line elements can be written 


as 
_ Of _ fu _ dv 
dw, = ant = (F + =) dx 
a Lf bu. 6 
UL v 
= ——idy = | — +i — } dy. 
dw2 ; aye (5 + ix) y 
Therefore, dA,, is given by 
_ 2 _ |Qudv  Oudv _ Ou, v) 
where 
Ou Ou 
Au,v) _|Oudv  Oudv|_| dx dy 
A(ax,y) |dxdy dAydx| |v dv 
Ox Oy 


is called the Jacobian determinant of the transformation. Since f(z) is 
analytic, u and v satisfy the Cauchy—Riemann relations over the region R, so 
the Jacobian determinant can be written as 


O(u, v) du\* du\? dv \? dv\? 
= ne — = ere ar . 1 . 
O(a, y) (=) * Oy Ox ‘3 Oy ery 
This provides a physical interpretation of the Jacobian determinant O(u, v)/ 


O(x,y); namely, it is identical to the square of the same factor h introduced 
in (10.4). 


10.1.4 Mapping of a Tangent Line 


We consider the mapping of a tangent line. Let C' be a curve in the z-plane 
and I” be the image of C in the w-plane (see Fig. 10.2). A differential segment 
dw along I is related to the differential segment dz along C' by 


310 10 Conformal Mapping 


dw = o az = f'(z)dz. (10.8) 


We suppose wo to be a point on I’ that is the image of z) on C’. Then from 
(10.8), the tangent to I’ at wo, denoted by 7(wo), is related to the tangent to 
C at zo, denoted by t(zo): 


T(wo) = — = f(z) = = f'(z0)t(20), (10.9) 


where \ parametrizes the curve of I’ on the w-plane. 

An immediate consequence of equation (10.9) is that if f’(zo) = 0, the 
tangent t(zo) on the z-plane cannot be related to the tangent 7(wo) on the 
w-plane. The point zo that satisfies f’(z) = 0 is called a critical point on 
the curve. For simplicity in the following discussion, we assume that the curve 
C does not contain any critical points. 

The characteristics of the mapping (10.9) become clear by employing the 
polar form. 
7(wo) = |r(wo)le*),  f’(20) = [f"(zo) le", and t(20) = |e(z0) |e. 

(10.10) 
The first equation shows that 7(wo) is oriented at an angle (wg) to the u- 
axis; similarly, the third one shows that t(zo) makes an angle @(zo) with the 
x-axis. It follows from (10.9) that 


[7 (wp) |e?) = | f"(z0)||€(20) |e 0+ C0, 
Thus the magnitude of 7(wo) and its argument read 
|7(wo)] = |f"(20)|It(20)| and (wo) = (20) + O(z0). 


Each equation gives us the properties of the conformal mapping of a tangent 
line as follows: 


(i) The magnitude of the tangent |t(zo)| is modified by the scale factor 
|f’(z0)|, thus being enlarged or shrunk by the mapping. Since |f’(zo)| 
depends on zo, the magnification varies from point to point on C. 


Fig. 10.2. Conformal mapping of a tangential line 


10.1 Fundamentals 311 


(ii) The angle between the tangent t(z) and the z-axis at zo differs from the 
angle between the tangent 7(w) and the u-axis at wo. The difference is 
determined by the argument of f’(z9), denoted by ¢, called the argument 
of the mapping; ¢ also depends on zg and thus varies from point to point 
on C. 


10.1.5 The Point at Infinity 


For later use, we introduce a few concepts that are at the basis of further 
investigations on conformal mapping. Our aim is to understand the way in 
which the entire spherical curved surface is mapped conformally onto the 
entire flat plane with a one-on-one correspondence. This is achieved with the 
help of a stereographic projection between the complex plane and an artificial 
sphere as described below. 

Let us consider a sphere of radius R (for convenience, R is taken as 1/2) 
such that the complex plane is tangential to it at the origin, as shown in 
Fig. 10.3. The point P on the sphere opposite the origin (called the north 
pole, for convenience) is used as the “eye” of the stereographic projection. We 
draw straight lines through P that intersect both the sphere and the plane. 
These lines permit a mapping of point z on the plane onto the point ¢ on 
the sphere (see Fig. 10.3). In this fashion the entire complex plane is mapped 
onto the sphere (called a Riemann or a complex sphere). 

As to the properties of the Riemann sphere, the following statements can 
be verified without much difficulty. 

1. Straight lines in the z-plane are mapped onto circles on the sphere that 

pass through P. 


Fig. 10.3. Riemann sphere 


312 10 Conformal Mapping 


2. The images of intersecting straight lines on the plane have two common 
points on the Riemann sphere, one of which is P. 

3. The images of parallel straight lines on the z-plane have only the point P 
in common, and they have a common tangent at P. 

4. The exterior of a circle |z| = R with R > 1 is mapped onto the interior 
of a small spherical cap around point P. As R — oo the cap shrinks to P. 


Note that the point P itself has no counterpart on the z-plane. Never- 
theless, it has been found convenient to adjoin an extra point to the z-plane, 
known as the point at infinity, in such a way that a curve passing through P 
on the Riemann sphere is the image of a curve on the z-plane that approaches 
the point at infinity. 


@ Point at infinity: 
The point at infinity z = oo is defined as the point Zz that is mapped 
onto the origin z = 0 by the transformation Z = 1/z. 


The importance of the point at infinity is greatly enhanced once we appreci- 
ate the conformal property of the stereographic projection: i.e., if two curves 
intersect on the z-plane at an angle y, then their images on the sphere inter- 
sect at the same angle. This conformal property permits the definition of the 
angle between two parallel straight lines on the z-plane, i.e., the angle that 
their images make on the sphere at point P. (Indeed this angle is equal to 
zero as noted in 3 above.) 


10.1.6 Singular Point at Infinity 


The concept of a point at infinity is closely interwoven with the study of 
singularities of analytic functions. The notion of analyticity can be extended 
to a point at infinity by the following device: A function f(z) is considered to 
be analytic at infinity if the function 


is analytic at z = 0. A more precise statement on this mater is given below. 


@ Extended definition of conformal mappings: 

A function w = f(z) is said to transform the neighborhood of a point zo 
conformally into a neighborhood of w = oo if the function 7 = 1/f(z) 
transforms the neighborhood of zg conformally into a neighborhood of 
0: 


10.1 Fundamentals 313 


Example The mapping w = 1/z is conformal at the origin z = 0. Initially, 
the function f(z) = 1/z is not defined at z = 0; however, the subterfuge 
based on the Riemann sphere makes the mapping w = 1/z meaningful (and, 
furthermore, conformal) at z = 0. Note that it is also conformal at z = oo 
even though the derivative f’(z) approaches zero as z > oo. 


Owing to the above convention, it becomes possible to introduce the con- 
cept of a pole at infinity, a branch at infinity, and so on, through the 
corresponding behavior of g(z) at the origin. In fact, owing to our convention, 
a function f(z) = e* that has no singularities in the original z-plane comes to 
possess an essential singularity at infinity. Other functions that have no 
singularities (e.g., all the polynomials in z) are also found to have a breakdown 
of analyticity at infinity. In contrast, functions that are analytic at infinity 
possess at least one singularity for some finite value of z. The natural conjec- 
ture is that there may not be a perfectly analytic function. This problem has 
actually been resolved and is embodied in the theorem below. 


@ Entire function: 

A function f(z) whose only singularity is an isolated singularity at the 
point at infinity z = oo is called an entire function (or integral func- 
tion). If this singularity is a pole of mth order, then f(z) must be a poly- 
nomial of degree m. 


@ Liouville theorem: 
The only function f(z) that is analytic in the entire complex plane as 
well as at the point at infinity is the constant function f(z) = const. 


Remark. In some texts the term “complex plane” is tacitly assumed to mean 
the extended complex plane with the point at infinity included. Certain 
theorems may then be stated more conveniently. However, one should never 
forget that while there is a point at infinity, there is still no such thing as a 
complex number “infinity” in the sense that it possesses the algebraic prop- 
erties shared by other complex numbers. 


Exercises 


1. Suppose that two differential curves on the z-plane, meet at a point zo at 
which f(z) = f”(2o) = +++ = f(z) = 0 and f™) (zp) 4 0. Show 
that the angle 6 between the two curves is magnified by m times through 
the conformal mapping w = f(z). 

Solution: From hypothesis, f(z) can be expanded in the 
neighborhood of the point zo as 


314 


10 Conformal Mapping 
f(z) = (2) + in(z — 20) + Cm4i(z — 20) +0, 


where c,, # 0. Then, by the same scenario as we used in deriving 
(10.1), the angle 6 between the mapped arcs at f(zo) reads 


6 = limarg pees = lim arg (22) 


£=0 f 21) = f (Zo) £0 21 — 20 
= mlim ar o =m. & 
£0 21 


. We say that the mapping w = f(z) is locally one-to-one at zo if f(z 
f(2) for any two distinct points z; and z2 within the circle |z — zo| 
with some 6 > 0. Show that w = f(z) is locally one-to-one at zo if f(z) is 
analytic at zo and f’(zo) £ 0. 


Solution: Let f(zo) = a and take 5 > 0 small enough so that 
f(z) — @ has no other zero in |z — zo| < 6. In view of the theorem 
regarding the isolated property of zeros, such a 6 can always be 
found. The argument principle says that 


d 
= $ j-a ee 
where C is a circle |z — zo| = 6. Denoting I’ = f(C), we have 
fies —¢ dw =f dw 
— QniJpw-—a 2@wisJpw—B 


for any (@ satisfying |G—a| < © with sufficiently small ¢. If we take 
6’ <6 so that 


D = {2z;|2 —29| < 6} C f— [D* ={w; |w— 0) =e}, 


it follows that for any 21, z2 € D, 


1 dw 1 dw 
EE ese =o fap 


or equivalently, 


4 fa) £2) 
aif Tete” i = 


This means that each function f(z) — f(z1) and f(z) — f(z2) has 
only one zero inside the circle |z — zo| = 6. Therefore, we conclude 


that f(z1) 4 f(z2) ifz1 A 2. & 


) 
< 
ia 


# 
6 


10.2 Elementary Transformations 315 


10.2 Elementary Transformations 


10.2.1 Linear Transformations 


The most simple conformal mapping w = f(z) would be the following: 


@ Linear transformation: 
= ere 4 6, (10.11) 


where a and (@ are complex numbers. 


A linear transformation generates a translation plus a magnification and a 
rotation of a polygon, but does not affect its shape. Thus, for example, a line 
maps to a line, a rectangle maps to a rectangle, a circle maps to a circle, etc. 

To appreciate the above statement, we first consider the particular case of 
a = 1. From (10.11), we have 


w=2z+68, (10.12) 


which describes a translation by the constant ( of the points being mapped. 
Obviously, a translation does not modify the length of a line or its orientation, 
only changes its position with respect to the coordinate axes. Since a polygon 
is constructed from three or more lines, the size and orientation of a polygon 
are not affected by a translation; only the position of the polygon is changed. 

Next we consider the case of G = 0. When we express a in polar form, the 
linear transformation becomes 


w = jale’?z 
with a constant argument yy. Then, the line between two points transforms as 


W1 — W2 = lale’? (z1 _ 22) = |a| é Fal = zo|et) 
Therefore, the length of a line in the z-plane, |z, — z2|, becomes magnified 
by a constant factor |a| and the line is rotated through an angle y. Thus, 
the lengths of the sides of a polygon and the orientation of the polygon with 
respect to the axes is modified. Nevertheless, its shape remains unchanged by 
the linear transformation with 3 = 0. 

We have seen that the values of a and ( straightforwardly determine the 
image of a polygon in the z-plane under a particular linear transformation. 
Conversely, if one knows the coordinates of two points on the original polygon 
in the z-plane and the images of those two points in the w-plane, one can 
determine a and ( and thus the linear transformation. 


316 10 Conformal Mapping 
10.2.2 Bilinear Transformations 


There is another important conformal mapping referred to as the bilinear 
transformation (or the fractional or Mébius transformation): 


@ Bilinear transformation: 


az+ 6 
w= See (10.13) 


where a, 3, y and 6 are complex numbers satisfying the relation ad—Gy £ 0. 


The condition ad — By # 0 ensures that 
df _ ad ~ py 


dz (yz+6)? 


is nonzero at any finite point of the plane. Accordingly, the bilinear transfor- 
mation (10.13) possesses the one-to-one property because if f(z1) = f(<2), 
then 

az+B  azt+f 

yat+6  yzo4+ 6’ 


which implies (ad — By)(z1 — 22) = 0, and thus z, = 29. 


Remark. 


1. If y =0, the bilinear transformation (10.13) reduces to a linear transfor- 
mation, which has already been discussed. Thus, we require that y 4 0 in 
what follows. 

2. The function f(z) = (az+)/(yz+54) serves as a general solution (see 
Sect. 15.1.4) of the differential equation: 


ia | pf 2 
aa) 
which is called the Schwarz differential equation. 


Observe that the mapping (10.13) has two apparent exceptional points: z = oo 
and z = —d/y at which w diverges. It is possible to weed out these exceptions 
by extending the definition of conformal representation such that the point 
at infinity is included. With such an extension, the conformal property of 
the transformation (10.13) at the two points is recovered, even though the 
function f(z) itself diverges. Similarly, we cay say that w = f(z) transforms 
the neighborhood of z = oo conformally into that of a point wo if w = 6(€) = 
f(1/&) transforms the neighborhood of € = 0 conformally into that of the 
point wo. 


10.2 Elementary Transformations 317 


A particularly interesting example of the bilinear transformation is 


w=f(z)= = (10.14) 


where Im(z9) 4 0. This transformation maps the upper half-plane of the z- 
plane including the z-axis, onto the unit circle centered at the origin of the 
w-plane. This is demonstrated in Exercise 1. 


10.2.3 Miscellaneous Transformations 


In what follows, we note several elementary transformations that facilitate a 
better understanding of the conformal nature of analytic functions. We shall 
see that any conformal transformation may be regarded as a transformation 
from Cartesian to orthogonal curvilinear coordinates. 


Example 1. w = z7, w= fz 


Assume a conformal mapping defined by 
w= 2’. (10.15) 


Setting z = «+ iy and separating the real and imaginary parts, we have 


g—ya=u, y=. (10.16) 
Thus, the straight lines parallel to the z- and y-axes in the z-plane denoted 
by 

x=a and y=b 


are mapped onto rectangular hyperbolas in the w-plane given by 


2 2 
ye @ - tl 2 
u=a Ge and U= Fa 8 
respectively. This is shown schematically in Fig. 10.4. 
Another important feature of the mapping (10.15) is found by expressing 


z and w in polar coordinates: 
z=pe?, w=re’®. 
On substitution in (10.15), we obtain 
r=p', 0=2¢. (10.17) 


Hence, the upper half of the z-plane, 0 < ¢ < 7, goes into the entire w- 
plane, 0 < 6 < 27; the lower half also goes into the entire w-plane. In other 
words, points z and —z in the z-plane obviously go into the same point in the 


318 10 Conformal Mapping 


Fig. 10.4. Mapping w = 2? 


w-plane. This suggests the possibility that some distinct geometric figures in 
the z-plane may go into coincident figures in the w-plane. 

Next we consider the transformation: w = \/z. In terms of polar forms, it 
reads 


Vz= preter. 
so that we have 
r= jp, = 2 nm, (10.18) 


Owing to the additional term na in the latter equation in (10.18), a half 
revolution in the z-plane corresponds to one complete revolution in the 
w-plane. This is obviously a manifestation of the multivaluedness of the root 
function. The mapping of the upper half of the z-plane onto the w-plane is 
illustrated schematically in Fig. 10.5. 


Fig. 10.5. Mapping w = /z 


10.2 Elementary Transformations 319 


Example 2. w = e*, w = log z 
In the case of 
w=e*, (10.19) 


there are simple relationships between the Cartesian coordinates in the z-plane 
and the polar coordinates in the w-plane 


10 


re” = e™Y = e*(cosy+isiny); ie, r=e 


, O=y. 
The lines x = const., parallel to the y-axis, become concentric circles in the 
w-plane; the lines y = const., parallel to the x-axis, become rays emerging 
from the origin. Accordingly, a strip of the z-plane bounded by y = yo and 
y = Yo + 27 goes into the entire w-plane. 

In the inverse of (10.19) 


z=logw, «=logr, y= 0+ 2nz, 


which is an infinitely many-valued function since all points for different values 
of n correspond to the same point in the w-plane. 


Example 3. w = coshz 
Next let us consider the following functions: 


w = cosh z. 
The Cartesian coordinates in the two planes are related as follows: 


u+iv =cosh(a+ty) =coshacosy + isinhxsin y, 
u=coshzcosy, v=sinhazsiny. (10.20) 


Dividing the first equation by coshz, the second by sinhz, squaring and 
adding, we have an ellipse in the w-plane that corresponds to the straight 
line « = const. in the z-plane. Similarly, y = const. goes into a hyperbola in 
the w-plane. The equations of the ellipses and hyperbolas are 


uz v2 uz ye 


| =, _ ib 10.21 
sinh? x cos? y sin? Yy ( ) 


2 
cosh* x 


The semimajor and semiminor axes of the ellipses are cosh x and sinh; the 
semifocal distance is unity. The semiaxes of the hyperbolas are cos y and sin y; 
the semifocal distance is unity. Hence, equations (10.21) represent families of 
confocal ellipses and hyperbolas. This transformation may be regarded as a 
transformation from Cartesian to elliptic coordinates. 


Example 4. w = 1/z 
Consider the function 
(10.22) 


Rl rR 


320 10 Conformal Mapping 
and use rectangular coordinates to obtain 
(u+iv)(@+ iy) =1. 
By equating real and imaginary parts, we set 
ux—vy=1, ve+uy=0. 


By an algebraic elimination first of x and then of y, we arrive at the two 
families of circles: 


BNE. 2 al iy 1 
w+ (o+ = = (u =) += rie (10.23) 
The degenerate cases x = 0 and y = 0 cannot be handled by (10.23), but from 
(10.22) we find that respectively, they give the two axes u = 0 and v = 0. 
The transformation is shown in Fig. 10.6. Note that through the transfor- 
mation, the edge of the z-plane at infinity (z = oo) is pulled into the origin 
of the w-plane (w = 0), whereas the center of the z-plane is stretched out 
in all directions to infinity in the w-plane. It is possible to visualize this pro- 
cess by introducing an artificial concept, called “the point at infinity”; see 
Sect. 10.1.5 for details. 


Remark. The mapping w = 1/z reverses the orientation of the circumference of 
the circle to be mapped: arg(w) = — arg(z). For example, the circumference 
of |w| = 1 is described in the negative since if |z| = 1 is described in the 
positive sense. 


Fig. 10.6. Mapping w = 1/z 


10.2 Elementary Transformations 321 


10.2.4 Mapping of Finite-Radius Circle 


Remember that the analyticity of functions is characterized by the isotropy 
of their derivatives. Owing to the isotropy, infinitely small circles on the 
z-plane are transformed into infinitely small circles an the w-plane through 
any analytic function w = f(z). Of course, this shape-preserving behavior dis- 
appears when the circle has a finite radius; because the scale factor h generally 
depends on z. Nevertheless, there exist a class of nontrivial analytic functions 
that transform a finite circle on the z-plane onto the w-plane, which is simply 
a bilinear transformation. 


@ Theorem: 
Bilinear transformations w = f(z) map circles (or straight lines) on the 
z-plane onto circles (or straight lines) on the w-plane. 


Proof Our proof is based on the fact that the bilinear transformation formula 
(10.11) can be rewritten as 


7 of 4 SO. 
wet a a ee: 


This is composed of a sequential transformation of the following: 


1. w=2+5, asimple translation of the plane by the complex vector b. 

2. w = az, a rotation of the plane through the angle arga, followed by an 
expansion (or contraction) by |al. 

3. w = 1/z, an inversion that takes the interior of the unit circle to the 
exterior and vice versa. 


Since these transformations are all conformal, their composition surely maps 
circles (or straight lines) onto circles (or straight lines). & 


Remark. Statement 3 above regarding the inversion w = 1/z is followed by 
considering the equation 


a(x? + y?) + Ba + yy +6 =0, 


which represents a circle (a 4 0) or straight line (a = 0) in the z-plane. This 
can be written as 


alzl? + Set 2')+ 2-2") +5=0. (10.24) 


Then, the transformation w = 1/z maps it onto 
6|w|? + —(w + w*) — —(w — w*) +a =0, 


which is a circle (6 4 0) or a straight line (6 = 0). 


322 10 Conformal Mapping 
10.2.5 Invariance of the Cross ratio 


The following peculiarity of a Mobius transformation serves as a useful device 
in applications of conformal mapping. 


@ Invariance of the cross ratio: 
Any Mobius transformation w = f(z) that maps the four points z; 
(¢ = 1,2,3,4) into w; (i = 1,2,3,4), respectively, satisfies 


(wi — wa)(w3 — we) _ (21 — 24) (23 — 22) _ X 
(wi —we)(w3 —wa) (21 — Z2)(23— 24) 


The constant A is called the cross ratio (or anharmonic ratio). 


Proof Let z; (¢ = 1,2,3,4) be four distinct finite points on the z-plane and 
let w; (¢ = 1,2,3,4) be their corresponding images through a Mobius trans- 
formation. Then, for any two of the points, we have 

az+tB azyzt+B ad — By 


OR at 6 +S mF OH +S) aE 


and, consequently, for all four, 


(wy — w4)(w3 — we) _ (21 — 24)(23 - 22) (10.25) 


(wi — we)(w3 — wa) (21 — 22)(23 — 24) 


This clearly ensures the invariance of the cross ratio A under the Mobius 
transformation. & 


Remark. If one of the points of w;, say wi, is the point at infinity, the 
corresponding result is obtained by letting w; — oo in (10.25). The left-hand 


side then takes the form 
W3 — W2 


W3 — W4 ; 
This expression is to be regarded as the cross ratio of the points 00, wo, w3, Wa. 
A similar remark applies if one of the points z; is the point at infinity. 


If z4 is taken to be a variable z, then the corresponding image w,4 on the 
w-plane becomes a function of z that obeys the relation 


(wi — w)(w3 — w2) _ (zi — z)(23 - 2) (10.26) 
(wi — we)(w3—w) = (21 — 22) (23 — 2) 
By solving (10.26) for w, we can verify that it transforms the three points 


21, 22,23 into the corresponding points w ,we2,w3. In this context, the ex- 
pression (10.26) turns out to show that a Mobius transformation is uniquely 


10.2 Elementary Transformations 323 


determined by three correspondences. Since a circle is uniquely determined by 
three points on its circumference, (10.26) can be used to find Mébius trans- 
formations that map a given circle determined by z;(i = 1,2,3) onto a second 
given circle (or straight line) determined by w;(i = 1, 2,3). 


Example If we take z; = 1, z2 = 1,23 = —1 and w; = 0,w2 = 1, w3 = ow, we 
obtain the transformation / 
_l-2z 
wats ip 


This maps the circle |z| = 1 on the real axis and the interior |z| < 1 of the 
unit circle on the upper half of the w-plane. 


Exercises 


1. Consider the function w = f(z) = (z—20)/(z— 2) with Im(zo) 4 0. Show 
that it maps the region Imz > 0 onto |w| < 1. 


Solution: Set z= 2x to obtain 


Lz £=— 2 \" GZ L— 2% 
L— 2% L— 2% L— 2% L— 2 
That is, the image on the z-axis is the circumference of the unit 
circle centered at the origin of the w-plane. 
Next we evaluate the image of a point off the x-axis in the 


upper half of the z-plane. Expressing z and zo in polar form, we 
have 


(re®? — roe*0) (re — roe) aoe 


(re — rye—i90) (re-® — ryei) = pees 


|w|? = (10.27) 


where 

&=r?t ra —2rrocos@cos6) and &) = 2rrosinO@sin 6. 
Since —1 < cos@ cos 6) < 1, we have 

(r—19)? <r?+r2 —2rrpcosPcosO =&, ie. & >0. 


In addition, since z and zp are in the upper half-plane, both sin @ 
and sin @ are positive, so 2 > 0. Consequently, we have 


jw? <1, 


which means that the images of points in the upper half of the 
z-plane are located in the interior of the unit origin-centered 
circle. & 


324 10 Conformal Mapping 


| Remark. If zo were real, all points z would be mapped onto the single point 
w = 1, which is the reason we assumed Im(zq) # 0 in the first place. 


2. Show that w = (z — 20)/(z§z — 1) in which |zo| < 1 maps |z < 1| onto 
|w| < land z= 2 onto w =0. 
Solution: Observe that 


1 poy? 1 — zal? balla? = 2? = (ag)? 4 
[282 — 1P lez — 1 
_ (=|el?)G = |zol?) 
lzgz-1P 
Hence, |z| = 1 corresponds to |w| = 1. In addition, z = zg cor- 
responds to w = 0. These mean that |z| < 1 is transformed onto 


lw] <1. & 

3. Let C' and C* be two simple closed contours in the z- and the w-plane, 
respectively, and let w = f(z) be analytic within and on C. If w = f(z) 
maps C onto C™ in such a way that C™ is traversed by w exactly once in 
the positive sense under the condition that z describes C’ in the positive 
sense, then w = f(z) maps the domain bounded by C' onto the domain 
bounded by C*. 

Solution: We denote the domains bounded by C' and C* by D 
and D*, respectively. Then it suffices to prove that every point of 
D* is taken exactly once if z is in D. Recall that the number n of 
zeros of the function wo — f(z) in D is given by 


1 / 
n= 7) F(z) dz. 
2ni Io f(z) — wo 
With the substitution w = f(z), f’(z)dz = dw, this is rewritten 
as 


22 od dw 


— Qri Joe WW’ 


where the integration has to be extended over the contour C* into 
which C is transformed by w = f(z). By the residue theorem, the 
value of this expression is 1 if wo is within C* and 0 if wo is outside 
C*. This shows that every point in D* is taken exactly once and 
that a value outside D* is not taken at all. This completes the 
proof. d& 
4, Find a conformal mapping w = f(z) of the region between the two circles 
|z| = 1 and |z — (1/4)| = 1/4 onto an annulus a < |z| < 1. 

Solution: To solve this, we have to find a bilinear transformation 
that simultaneously maps |z| < 1 onto |z| < 1 and |z—(1/4)| < 1/4 
onto a disc of the form |z| < a. Note that 


10.3 Applications to Boundary-Value Problems 325 


z—Ma 


1l—a*z 
maps |z| < 1 onto |z| < 1, and that 
(2) 4z-1-86 
z) =a-——.___ 
u 1 — B*(4z—1) 


maps |z| < 1 and |z—(1/4)| < 1/4 onto a disc of the form |z| < a. 
Equating coefficients leads us to a = 2 — J3. & 
5. Find the bilinear transformation that maps z = 0,7, —1 onto w = 1,—-1,0, 


respectively. 
Solution: Set [z,0,7,—-1] = [w,1,—1,0] to obtain w = —(z+%)/ 
(3z—1). de 


6. Show that four distinct arbitrary points on the z-plane can be mapped 
through the bilinear transformation onto w = 1,—1,c,—c on the w-plane, 
where c is a complex number depending on the cross ratio A of the map- 
ping. Determine an explicit form of c as a function of X. 

Solution: Let [z1, 22, 23, 24] = [1,—-1,c,—c] to obtain c = (1+ 
A\+2/2)/(1—A) and cicg=1. & 


10.3 Applications to Boundary-Value Problems 


10.3.1 Schwarz—Christoffel Transformation 


In the preceding section, we discussed rich properties of the bilinear trans- 
formation that can transform the upper half of the z-plane onto the unit 
circle of the w-plane. Now we turn to a similar kind of important mappings 
called the Schwarz—Christoffel transformation (abbreviated SC transfor- 
mation), which transforms the upper (or lower) half of the z- plane onto the 
inside of a n-sided polygon drawn on the w-plane. This transformation is 
defined by the following integral: 


w(z)=B+a)> fe ae RL Ce (10.28) 
4=1 


Here x; (1 <i <n) are n distinct fixed points along the z-axis, and the angle 
0; is defined as shown in Fig. 10.7, being either positive or negative according 
to whether we follow the boundary of the polygon counterclockwise or clock- 
wise. (For example, 0, and 02 are positive, but 63 is negative in Fig. 10.7.) 
The constant a gives rise to a magnification of that image by a factor |a| 
and a rotation of that image by an angle arg(a). The constant @ generates a 
translation of the magnified and rotated image. 


326 10 Conformal Mapping 


Fig. 10.7. Schwarz—Christoffel transformation of the real axis of the z-plane to a 
polygon on the w-plane 


Remark. If we wish to transform the upper half of the z-plane into the exterior 
of the polygon in the w-plane, it suffices to define 


z 
w(z) = B+ af (al eg) al eg Sey de", 
where the 6’s are assigned the same values as in the preceding case. 


Example The function 


ee dé 
w= fte)= | Ti 0 (0<k<1) (10.29) 


maps the upper half of the z-plane (Imz > 0) into the interior of a rectangle 
on the w-plane. In fact, (10.29) is obtained by putting n = 4, 0; = 7/2 for 
all i = 1,2,3,4 in the definition (10.28), followed by setting x; = 1, x2 = —1, 
x3 = 1/k and x4 = —1/k, all of which are located on the real axis. The integral 
in (10.29) is called an elliptic integral of the first kind. 


10.3.2 Derivation of the Schwartz—Christoffel Transformation 


In order to derive equation (10.28) for the Schwarz—Christoffel transformation, 
we let 
1 <2 <'** Ly 


be points on the real axis and consider the function f(z) whose derivative is 
f' (2). = a2 — a1) (2 — 2a) = (2 ta). (10.30) 
For this function we have 
arg f’(z) = arga — ky arg(z — 71) — kg arg(z — 22) —-:: —k, arg(z— 2). 


Now, visualize the point z as moving from left to right along the real axis, 
starting to the left of the point x,;. When z < x1, we have 


10.3 Applications to Boundary-Value Problems 327 
arg(z — #1) = arg(z— 42) =--- =arg(z-—2@,) = 7, 


whereas for x1 < z < £2, arg(z—2,) = 0, the others remaining at 7. Hence, as 
z crosses a; from left to right, arg f’(z) increases by ki7. It remains constant 
for 71 < z < #2 and increases by kom as z crosses #2, etc. As a result, the 
image of the segment —oo < z < a, becomes a straight line, the image of 
v1 < z% < “2 being another whose argument exceeds that of the first by kiz, 
and so on. 

If we constrain the numbers kj,--- ,k, to lie between —1 and 1, then 
the increments in the argument of f’(z) will lie between —a and x. Further, 
for ky < lke < 1,---,kn < 1, it is obvious that the function f(z) whose 
derivative is (10.30) is actually continuous at each of the points 11, 22,-°-+ ,@n. 
Therefore, the image of the moving point z will be a polygonal line. Finally, 
integrate (10.30) to set the equation 


fl) =8+ a | (2! — a1)~™*(z! — ag) ~* «++ (z’ — ay) * dz’, (10.31) 
which maps the x-axis onto a polygonal line. 


Remark. 


1. The sum of the exterior angles of this polygonal line is 
kym thom tess + knw =m So ky. 
i=1 


Hence, in order for the polygon to be closed, it is necessary that ae 
k; = 2. Particularly when k; > 0 for all 2, then the polygon becomes 
convex. 

2. The complex constants, a and (, control the position, size, and orientation 
of the polygon. Thus @ may be so chosen that one of the vertices of the 
polygon will coincide with some specified point e.g., the origin. Then a 
may be chosen so that one side of the polygon will be of given size and 
parallel to a given direction. 


10.3.3 The Method of Inversion 


The Schwarz—Christoffel transformation itself is applicable to polygons com- 
posed of straight lines, but not to those of circular ones. Nevertheless, combin- 
ing the method of inversion, the former transformation can be extended 
to regions bounded by circular arcs. 


328 10 Conformal Mapping 


@ Inversion with respect to a circle: 
An inversion transformation w = f(z) with respect to a circle |z| = a is 
defined by 


a2 


(10.32) 


o= =. 
ae 
through which the interior points of the circle are mapped onto exterior 
points, and vice versa. 


The inversion preserves the magnitude of the angle between two intersecting 
curves, but it reverses the sign of the angle. This is attributed to the fact 
that (10.32) consists of two successive transformations: the first a?/z, and the 
second a reflection with respect to the real axis. The first of these is conformal, 
whereas the second maintains the angle but reverses its sign. 

For the purpose of this section, we investigate the inversion of a circle of 
radius |zo| centered at z = z9 0. This circle is expressed by 


|z — zo| = |zo| (10.33) 


or 
z*z—2z(z2+2*) =0. (10.34) 


Note that this circle passes through the origin, i.e., the center of an inversion 
circle. Through the inversion (10.32), the circle (10.34) is mapped onto 


at a? a? 
20 + = 0. 
ww* W w* 


Fig. 10.8. Inversion of the circle |z — zo| = |zo| in (10.33) with respect to a circle 
|z| = a through the mapping w = a?/z* given in (10.32) 


10.3 Applications to Boundary-Value Problems 329 
By multiplying ww* on both sides and putting w = u+ iv, we have 
a* — 2a zou = 0, 


or equivalently, 
u=—. 
220 
This means that by the inversion, the circle (10.33) is mapped onto a straight 
line parallel to the imaginary axis of the w-plane (see Fig. 10.8). 

The role that inversion plays in extending the Schwarz—Christoffel trans- 
formation should now be clear. Assume two interesting circular arcs such as P 
and Q in Fig. 10.9 and a circle R of radius a whose center is the intersection 
of the two circular arcs. Then, by an inversion with respect to R, the point 
at the intersection is transformed into the point at infinity, the arcs them- 
selves being transformed into the solid portions of the lines P’ and Q’. As a 
result, the Schwarz—Christoffel transformation may now be applied to these 
two straight lines, whereas it may not be applied to the original circular arcs. 


Exercises 


1. Find a transformation that maps the upper half of the z-plane onto the 


triangular region shown in Fig. 10.10 in such a way that the points 7; = —1 
and x2 = 1 are mapped onto the points w = —a and w = a, respectively, 
and the point x3 = +oo is mapped onto w = ib. 


Solution: Let us denote the angles at w and wz in the w-plane 

by ¢1 = ¢2 = ¢, where ¢ = tan 1+(b/a). Since x3 is taken at 

infinity we may omit the corresponding factor in (10.28) to obtain 

w= pra f e+1)*(E—1) ag = Bra | (et—1)-*/ak. 
0 0 

(10.35) 


Fig. 10.9. Inversion of circular arcs P and @ with respect to the circle R 


330 10 Conformal Mapping 


Fig. 10.10. Mapping of the upper half of the z-plane onto a certain limited region 
of the w-plane 


The required transformation may then be found by fixing the 

constants a and as follows. Since the point z = 0 lies on the 
line segment 2122 it will be mapped onto the line segment w ,w2 
in the w-plane, and by symmetry must be mapped onto the point 
w =0. Thus setting z = 0 and w = 0 in (10.35), we obtain 6 = 0. 
An expression for @ can be found by considering the region in 
the w-plane in Fig. 10.10 to be the limiting case of the triangular 
region with the vertex ws at infinity. Thus we may use the above, 
but with the angles at w, and we set to ¢ = 7/2. From (10.35), 
we obtain w = a {5 /Ve2 —1)dé = iasin™' z. By setting z = 1 
and w = a, we find ia = 2a/z, so the required transformation is 
w =(2a/m)sin"+z. & 

2. Find the conformal mapping that transforms the interior of the circle 
|z| < 1 to the interior of a polygon on the w-plane, subject to the condition 
that the points z1,22,--- ,2n lying on the circle |z| = 1 are mapped, 
respectively, onto the vertex w 1, W2,°-: ,Wn of the polygon. 

Solution: Consider first the transformation 
Fy ee (10.36) 


zZ—1 


which maps |z| < 1 onto Imr > 0. It yields 


10.3 Applications to Boundary-Value Problems 331 


dt 2 2(z; — 2) 
= =-Guop and T- 7; = ee Cp ease): 
(10.37) 
Next, we assume that through (10.36), the points 21, 22,--- Zn are 
mapped, respectively, onto the points 71,72,--: ,7 that are lo- 


cated on the line Im7r = 0. Then, the transformation that maps 
Imz7 > 0 onto the interior of a polygon on the w-plane is given by 
w = w(T), whose derivative reads 


d 

= = a(t — 14) /-U (7 — 79) 2D (7 = Rn -7, 
(10.38) 

Here k; is the internal angle of the polygon at the ith vertex, which 


satisfies )>;"_, kj = (n = 2). From (10.37) and (10.38), we have 


dw 2a 1 (ap = 2) OT! s.(a go) On iH} 


dz (z- D2 2 (z “ja j) (ka /m)—1 (Zn — j) (en /m)—1" 


Replace (a/2)(z, — i)1~(*1/7) ..- (z_, — i)!~»/™ by a@ to obtain 
the final result: 


w= f(z)=a | Caer Mae eae ca aa 
ZO 

where a(# 0), 3 are complex constants and zo 4 21,°+: Zn. & 
3. Prove that the function 


. 1 
w= f(z) : yin (10.39) 
maps the unit circle on the z-plane onto a regular hexagon on the w-plane. 
Solution: Observe that €°—1 = (€—£1)---(€—& ) with |g| = 1 
(j = 1,--- ,6). Similarly to Exercise 2 above, we map |z| < 1 
onto Imtv > 0, and then let the points 71,---7 located on the 
line Imt = 0 correspond to £),---&. Then, by setting n = 6 and 
k; = (2/3)z for all 7, we see that the transformation (10.39) maps 
|z| < 1 onto a regular hexagon on the w-plane, (\/2/6)I"(1/3) on 
a side. 
4. Suppose that ¢(z) satisfies the Laplace equation and let w = f(z) bea 
conformal mapping. Then, show that the function 


o(w) = o(u, v) 
also satisfies Laplace’s equation in the w-plane; i.e., 


Pp Oo _ 


ag t ae oe (10.40) 


332 10 Conformal Mapping 


Solution: Since « = x(u,v), the partial derivative 0/Ox can be 
rewritten as 0/Ox = uz(0/Ou)+v,(0/0v), where uz, = Ou/Ox and 
Uy, = Ov/Ox. It yields 


Po 0 i 0 a) 0 5 
0x2 oe, 2 By On : "ty 


ao Oo Oo 
= 20" 2 
= (ue) 55 + (v2)" ay + 2uates a. (10.41) 
Similarly, we have 
‘aa oe 2 Oe 1 Ou y oro 
Oy? YOu? Y Ave 8 ¥ Audv 
ao Oo Oo 
= DOS 2 
= (vz) 72 (uz) Fo? 7 Beate 5 a, (10.42) 


where we have used the Cauchy—Riemann relations: uz = Vy, Uy = 
—v,. Adding up the sides of the second lines of (10.41) and (10.42), 
we obtain 


Pd Oh _ 2 21 (Po  &o 


The quantity inside the square brackets is equal to | f’(z)|?, which 
is nonzero for analytic functions f(z). As a consequence, we con- 


clude that 
Oo od Ord Po 
BR gh ae aye 


10.4 Applications in Physics and Engineering 


10.4.1 Electric Potential Field in a Complicated Geometry 


The Schwarz—Christoffel transformation is useful in mathematical physics, 
since it can be used to solve two-dimensional Laplace equations under 
certain boundary conditions. In fact, there are many physical systems that are 
described by Laplace’s equation subject to Dirichlet or Neumann bound- 
ary conditions. For example, Laplace’s equation can be used to describe 
heat conduction in a uniform medium, nonturbulent fluid flow, and an elec- 
trostatic field in a uniform system. In this subsection, we demonstrate how 
the Schwarz—Christoffel transformation works efficiently to solve such two- 
dimensional Laplace equations. It should be emphasized that our method is 
independent of the physical system being described. In the meantime, we 
apply the transformation to problems in electrostatics in order to illustrate 


10.4 Applications in Physics and Engineering 333 


the method of solution, bearing in mind that that these techniques are also 
applicable to problems involving other physical systems. 

The general procedure for determining the electrostatic potential by us- 
ing conformal mapping methods involves transforming a complicated charge- 
distribution geometry in the z-plane into a simple geometry in the w-plane. 
After solving the problem for the simpler geometry, the inverse transformation 
to the z-plane is applied to obtain the potential for the original geometry. 

As a concrete example, we consider a metal block with a cut out wedge of 
angle 7 as shown in Fig. 10.11. There is a vacuum inside the wedge. The block 
extends to +oo in the direction perpendicular to the plane of the page. Since 
charge moves freely inside a metal, all of the charge placed in the conductor is 
distributed in such a way that the potential at all points along these edges is 
the same. We denote this potential by @o, i-e., the system under consideration 
is subject to the Dirichlet boundary conditions given by 


d(r, 6 = 0) d(r, 0 = ¥) do. (10.43) 


Fig. 10.11. Wedge cut of a metal 


Our objective is to evaluate the potential ¢(z) at points in the vacuum 
region inside the wedge (defined by 0 < arg(z) < y). This potential satisfies 
the Laplace equation, and thus, it can be determined by conformal mapping 
methods. For this purpose, we attempt to find the mapping that transforms 
the wedge in the z-plane onto the real axis of the w-plane. We know that 
the transformation of the real axis in the z-plane onto the wedge shown in 
Fig. 10.11 is given by the Schwarz—Christoffel transformation: 


w= B+a(z—2,)~-™/", 


Therefore, the inverse mapping 


z=a,+ tw - a) 7“ (10.44) 


transforms the wedge in the w-plane with an internal angle —6, onto the 
real axis of the z-plane. By interchanging z and w in (10.44), we obtain the 


mapping 


334 10 Conformal Mapping 


w= ut Fe = a) aa (10.45) 


which transforms the wedge in the z-plane onto the real axis of the w-plane. 
In order to apply this mapping to the configuration shown in Fig. 10.11, we 
set 


1 
y=-1, wm =6=0, and —=1, 
a 
where a is real. Then, the mapping in (10.45) becomes 


w= orl, (10.46) 


Remark. It immediately follows that the mapping in (10.46) transforms the 
space within the wedge onto the upper half of the w plane. This is because 
points within the wedge that satisfy the condition 0 < arg(z) = 0 < ¥ are 
mapped onto w = r7/7e'79/7, whose argument 70 /7 takes values in the inter- 
val (0,7). 


Remember that the Dirichlet boundary condition is invariant under conformal 
mappings. Hence, the boundary condition of (10.43) is mapped to 


d(u,v = 0) = go, (10.47) 


where v = 0 is the image of the wedge. As noted earlier, the mapping 
in (10.46) transforms the problem of finding the potential in the region 
within the wedge in Fig. 10.11 to that of finding the potential in the up- 
per half of the w plane due to a flat metal surface that extends along the 
entire u-axis and is maintained at a potential ¢@9 by a uniformly distributed 
charge. 

We now consider the “mapped” Laplace equation for the w-plane. Since all 
points on the surface of the flat plane are at the same potential, the potential 
all points (u,v) located at the same distance v above the plate is the same. 
Thus, the potential at any point must be independent of the value of u and 
the Laplace equation in the w-plane becomes 


Po _ 
dv2 


Integration of this differential equation followed by application of the bound- 
ary condition (10.47) yields 


o(v) = bo + cv. (10.48) 


The constant c is obtained by using the property that the derivative of the 
potential (i.e., the electrostatic field) is a constant for a charged flat plate. 
Similar to ¢o, the value of this constant field Eg depends on how much charge 
is distributed over a given area on the plate. With reference to (10.48), 


10.4 Applications in Physics and Engineering 335 
— =c =—Ep, sothat (v) = do — Lov. 


In order to complete the analysis, the potential must be expressed in terms 
of the coordinates in the z-plane. From this expression 


v =Im(w) = Im (peer) =r7/7 sin(76/7¥), 
the potential is given by 


b = ¢0 — Eor™/” sin(16/7) 


= $9 — Ey (2? +2)!" sin E tan=! () ; (10.49) 


x 


This is the final solution to the problem in question. We see from (10.49) that 
@ = $(r,0) is constant when 


7/7 sin(10/7) = const. 


This is the equation for a family of equipotential curves. 


10.4.2 Joukowsky Airfoil 


Our final discussion related to the applications of conformal mappings con- 
cerns the Joukowsky transformation, which is an important conformal 


lz-Zol = [1—zol 


w= fiz) 


Imaginary axis 


-2 -1 0 1 2 
Real axis 


Fig. 10.12. The Joukowsky transformation (10.50) of the circle |z — zo| = |1 — zo| 
with zo = (—0.2,0.2) to the airfoil indicated by the thick curve 


336 10 Conformal Mapping 


mapping that has been historically employed in the theory of airfoil design. 
Here, the term “airfoil” refers to the cross-sectional shape of a wing (or a pro- 
peller or a turbine). According to the literature on airfoil theory, any object 
with an angle of attack in a moving fluid generates a lift, a force perpendic- 
ular to the flow. Airfoils are designed as efficient shapes that increase the lift 
that the object generates. The Joukowsky transformation maps a circle on 
the complex plane into a family of airfoil shapes called Joukowsky airfoils, 
which simplify the analysis of two-dimensional fluid flows around an airfoil 
with a complicated geometry. 
The Joukowsky transformation w = f(z) is defined by 


w= f()=ats, (10.50) 


where z is located on a circle C that passes through the point z = 1 and 
encloses the point z = —1 as well as the origin z = 0. Note that the center 
of the circle, denoted by zg, does not coincide with the origin, but is located 
close to the origin. In fact, the coordinates of zp are variables, and changes in 
these variables alter the geometry of the resulting airfoil. An example of an 
airfoil generated by the transformation (10.50) is shown in Fig. 10.12, where 
zo = (—0.2,0.2). We see that the circle C : |z—zo| = |1—2o| is mapped onto an 
airfoil indicated by a thick curve. The stream lines for a flow around the airfoil 
can be obtained by applying an inverse transformation to the streamlines for 
a flow around the circle and the latter can be easily evaluated. 


Part IV 


Fourier Analysis 


11 


Fourier Series 


Abstract A Fourier series is an expansion of a periodic function in terms of an 
infinite sum of sines and cosines. The use of a Fourier series allows us to break up an 
arbitrary periodic function into a set of simple terms that can be solved individually 
and then recombined in order to obtain the solution to the original problem with 
the desired level of accuracy. In this chapter, we place particular emphasis on the 
mean convergence property of a Fourier series (Sect. 11.2.1) and the conditions 
that are necessary for the series to be uniformly convergent (Sect. 11.3.1). Better 
understanding of convergence properties clarifies the reasons for the utility and the 
limit of validity of Fourier series expansion in mathematical physics. 


11.1 Basic Properties 


11.1.1 Definition 


Fourier series are infinite series consisting of trigonometric functions with a 
particular definition of expansion coefficients. They can be applied to almost 
all periodic functions whether the functions are continuous or not. With these 
expansion, physical phenomena involving some periodicity are reduced to a 
superposition of simple trigonometric functions, which helps us a great deal 
in arithmetic and practical aspects. I section We begin this with a description 
of the basic properties of Fourier series. We follow this by considering the 
convergence theory of Fourier series, which is the issue in the next section. 

First of all, it is important to clarify the distinction between the following 
two concepts: trigonometric series and Fourier series. 


@ Trigonometric series: 


The series 
A lee) 
a + 2d (A, cosnz + B, sinnx) 


is called a trigonometric series. 


340 11 Fourier Series 


Here the set of coefficients {A,} and {B,} can be taken arbitrarily. (The 
expression Ag/2 instead of Ap is just due to our convention.) Among the 
infinite choices of {A,} and {B,}, a specific definition of the coefficients noted 
below provides the Fourier series of a given function f(z). 


@ Fourier series: 
The series 


= + 2 (em cos na + bp, sin nz) (11.1) 


is called a Fourier series of a function f(x) if and only if the coefficients 
are given by the Euler—Fourier formula expressed by 


i 7 
aes f(x) cosnadz, 
Le (a ; 
be, = = f(a) sin nada. (U2) 


Accordingly, a Fourier series is a specific kind of trigonometric series whose 
coefficients bear a definite relation (11.2) to some function f(x). In (11.1) we 
have written the constant term as ao/2 rather than ao, so that the expression 
for ag is given by taking n = 0 in (11.2). There is no bo for sin(0- x) = 0. 

By definition, every Fourier series is a trigonometric series. However, the 
converse is not true, as demonstrated below. 


Example It is known that the trigonometric series given by 


is not a Fourier series. Indeed, no function can be related to the coefficient 
1/logn via (11.2). 


11.1.2 Dirichlet Theorem 


Emphasis should be placed on the fact that the definition of Fourier series 
provides no information as to its convergence; thus the infinite series (11.1) 
may converge or diverge depending on the behavior of the function f(z). 
This leads us to discuss which functions f(a) make the series (11.1) conver- 
gent. This issue is clarified in part by the following theorem (and by referring 
Fig. 11.1): 


11.1 Basic Properties 341 


f(x f(x) 


ae 


F(x) 


(a) (b) (c) 


Fig. 11.1. (a) Continuous and smooth function. (b) Continuous but nonsmooth 
function. (c) Function with a finite number of discontinuities 


@ Dirichlet theorem: 
If f(x) is periodic with the period 27 and if f’(a) is continuous or at 

most have a finite number of discontinuity in [0,27], then its Fourier series 

converges to 

1. f(x), if x is a point of continuity, or 

f(x +0) + f(x —0) 


2. 
2 


, if x is a point of discontinuity. 


The set of conditions noted above is called Dirichlet’s conditions. It is wor- 
thy to note that the Dirichlet conditions are sufficient but not necessary. That 
is, if the conditions are satisfied, the convergence of the series is guaranteed; 
but if they are not satisfied, the series may or may not converge. An exact 
proof of Dirichlet’s theorem requires rather complicated calculations, which 
will be demonstrated in the next section. 


Remarks. 


1. The Dirichlet conditions do not require the continuity of f(x) within 
(0, 27]. 

2. Almost all periodic functions that we encounter in physical problems sat- 
isfy the Dirichlet conditions; therefore, the Fourier series expansion can 
be used almost regardless of its convergence. 


It follows that if f(x) is continuous within [0,27] and satifies Dirichlet’s con- 
ditions, then the Fourier series of f(x) converges to f(a) at all the points 
within [0,27]. This means that the Fourier series of f(x) converges uniformly 
to f(a). Once uniform convergence is ensured, we generally write 


f(x) = > + 2s (dp cos na + by, sin nx) (11.3) 


342 11 Fourier Series 


with the definition (11.2) for the coefficients. Consequently, if we form the 
Fourier series of f(x) without first examining its convergence to f(a), we 
should write S 

f(a) ~ 5 + De (dp, cosnx + b, sinnzx) (11.4) 
instead of (11.3). The symbol “~” in (11.4) means that the series on the right- 
hand side only corresponds to the function f(a) and can be replaced by the 
equality “=” only if we succeed in proving that the infinite series converges 
uniformly to f(z). 


11.1.3 Fourier Series of Periodic Functions 


Preceding arguments were limited to the case of periodic functions with pe- 
riod 27. But Fourier series expansions can apply to periodic functions whose 
periods differ from 27. This is seen by replacing x in (11.3) by (27/A)a, which 
transforms a series convergent in the interval [0,27] to another series conver- 
gent to [0, A]. The resulting Fourier series is 


rn : 
f(a) = Pus y (a, cosnkax + by sinnkz) , (11.5) 
where k = 27/X and 


2 (* a 
Gn = =| f(a) cosnkadx and bp, = =| f(a) sin nkadz. (11.6) 
0 0 


Obviously, these latter expressions can be reduced to the original definitions 
(11.1) and (11.2) by setting \ = 27. 

The expressions (11.5) and (11.6) become more concise by imposing the 
relations 


einka ae eo inks inka __ eo inka 


cos(nkx) = 5 , sin(nka) = 


Then the Fourier series reads 
ao = An — ibn inka = An + ibn inks 
mare) ae el os : 11. 
f(a) ao ; Je et ; Je (11.7) 
We rewrite the index n in the second sum by —n’ to find 


n=1 Rei 
= = (2) ein ke 


ni=—-1 


11.1 Basic Properties 343 


where the identities a_n = adn and b_, = —b, were used. As a result, we 
obtain a complex form of the Fourier series as 


HEC Eat) 


n=1 n=—-1 


yee (11.8) 
with the definition 
(11.9) 


Cc, = 
2 
An explicit form of c, is given by substituting the definition of a, and by, 

given by (11.6), into (11.9) as 


m3} ik f(x) cos(nkx) yar f(x) sin(nka)d “ 


= if f(z)e iF? de. (11.10) 


11.1.4 Half-range Fourier Series 


Fourier series expansions sometimes involve only sine or cosine terms. This 
actually occurs when the function being expanded is either even [f(—x) = 
f(x)] or odd [f(—a) = —f(x)] over the interval [—A/2, \/2]. When a given 
function is even or odd, unnecessary work in determining Fourier coefficients 
can be avoided. For instance, for an odd function f,(x), we have 


d/2 
An = ae fo(x) cos(nkax)dx 
2 0 r/2 
ae 1 fo(x) cos(nka)da + fo(x) cnn] 
—d/2 0 
9 d/2 d/2 
5 {- fo(x) cos(nka)dax + fo(x) cosine) 
0 0 
20 GS Loe) (11.11) 


and 


—d/2 0 


0 d/2 
bys = < 1 fo(a) sin(nka)dax + fo(x) sn(nkoyts} 


wars 


/2 
7 fo(x)sin(nka)dx (n=0,1,2,---,). (11.12) 
0 


344 11 Fourier Series 


Here we used the identities cos(—nkax) = cos(nka) and sin(—nka) = — sin(nka). 
Accordingly, we have 


fo(a) ~ S- by, sin(nka), 
n=1 


which is called the Fourier sine series. 
Similarly, in the Fourier series corresponding to an even function f.(2), 
the same process yields 


4 pr 
an = x | fe(x) cos(nkx)dxz (n=0,1,2,-) (11.13) 
0 


and b, = 0 for all n. Accordingly, the Fourier series becomes 


ao = 
fe(x) ~ oa + d Gy, cos(nka), 


which is called the Fourier cosine series. 

Note that a, and b, given in (11.12) and (11.13) are computed in the 
interval [0, \/2], whose width is half of the period . Thus, the Fourier sine or 
cosine series of an odd or even function, respectively, is often called a half- 
range Fourier series. As discussed later, half-range Fourier series expansion 
is important from a practical viewpoint because it enables us to expand a 
nonperiodic function within its domain. 


@ Theorem: 
If f(x) is an even or odd function and it is periodic with period A, then 
the Fourier coefficients a,, and b,, become 


4 pr 
an = 5 | f(x) cos(nkx)dz, by =0 if f(a) is even 
10) 


and 


p= Oy y= xf f(x)sin(nka)dx if f(x) is odd. 
0 


11.1.5 Fourier Series of Nonperiodic Functions 


A problem that arises quite often in applications is how to apply a Fourier 
series expansion to a function f(a) that is defined only on the interval [0, L]. 
In this case, nothing is said about the periodicity of f(x). However, this does 
not prevent us from writing the Fourier series of f(x), since the Euler—Fourier 
formulas (11.2) involve only the finite interval. 


11.1 Basic Properties 345 


-L 0 iE 


Fig. 11.2. Functions f.(%) and f,(a) defined in (11.14) and (11.15), respectively 


As an example, we try to expand the function 
f(x) = for [0, L] 


as a Fourier series. In this case, f(x) is not periodic, but we can make it 
a periodic function by extending it as an even or odd function over [—L, L] 
and periodic with period 2L. The respective definitions of f.(x) and fo(a) in 
[—L, L] are 


(2) —a for -—L<a<0, (11.14) 
(f= ; 
x for O<a<L 
and 
fo(x)=a for -—L<aX< lL, (11.15) 


whose profiles are shown in Fig. 11.2. 
First, we consider the case of the even function f.(a). In terms of the 
Fourier cosine expansion, the coefficients ag and a, are given by 


—4L 
Ua OE [aly aa Pry ae ae 
an => f pesmi a = ee ; 
L Jo T n 
0, n=2,4, 


2 L 
a= 7 f rdxz = L. 


Here we have used kL = 7. Hence, the cosine series becomes 


te Adee 1 (2n — 1)rx 
f(x) = 2” 7? 24 @n—1p cos I ‘ (11.16) 


The partial sums of the series given in (11.16) are illustrated in Fig. 11.3. 
Although the original function f(a) is defined only within the interval [0, L], 
the resulting Fourier series produces not only f(x) in [0, LZ], but also the even 
extension f.(x) with f.(x) = fe(a + 2D). 

Second, we look at the sine series of f, given by (11.15). In this case, 


2 f% 2L (-1)"+1 
bn = al fo(x) sin(nka)dx = cea ee ier 


n 


346 11 Fourier Series 


10} 
> 5/ 
OL 
10 -5 0 5 10 
x 


Fig. 11.3. A partial sum on the right-hand side of (11.16) 


and the sine series is 


a ee sin(nkz). (11.17) 


Figure 11.4 shows some partial sums of (11.17). As in the case of even ex- 
tension, the Fourier series produces the odd extension f,(#) with fo(a) = 
fe(x + 2L). 


10f 
5 
> of 
—5+ 
~10b 
=4). - <5 0 5 10 
x 


Fig. 11.4. A partial sum on the right-hand side of (11.17) 


11.1.6 The Rate of Convergence 


We have had two kinds of Fourier series representations for f(a) = x in the 
interval [0, ZL]. This poses the following question: Does it make any difference 
which kind of Fourier series, (11.16) or (11.17), we use to represent f(x) = x 
in the interval [0,\/2)? Yes, it does. In the above-mentioned case, the even 
extension f.(x) is more suitable than the odd extension f,() for following 
two reasons. 


11.1 Basic Properties 347 


The first reason concerns the rate of convergence of the resulting Fourier 
series. The coefficients given in (11.16) go as 1/(2n — 1)?, whereas those in 
(11.17) go as 1/n. Thus, the former series converges more quickly than the 
latter. The difference in the rate of convergence is due to the fact that the 
periodic extension of f(a) is continuous, but that of f,(x) has discontinuities 
at odd multiples of L. In general, the Fourier coefficients of discontinuous 
functions decay as 1/n, whereas those of continuous functions decay at least 
as rapidly as 1/n?. These observations as to the rate of convergence of the 
coefficient with respect to n can be formulated as follows: 


@ Theorem: 

If f(x) and its first k derivatives satisfy the Dirichlet conditions on the 
interval [0, \] and if the periodic extensions of f(x), f’(x),--- ,f@~)(a) 
are all continuous, then the Fourier coefficients of f(a) decay at least as 
rapidly as 1/n***. 


The second reason is that the Fourier series representation corresponding to 
the odd extension f,(x) exhibits a small discrepancy from the original function 
f(x) around points of discontinuity of f(a). This discrepancy is a Gibbs 
phenomenon, illustrated in Sect. 11.3.5. When an extension generates points 
of discontinuities, a Gibbs phenomenon will inevitably occur, which makes 
the resulting Fourier series representation highly unreliable in the vicinity 
of the discontinuity. Consequently, when performing half-range expansions of 
nonperiodic functions, the way of extension that renders the resulting function 
continuous (and smooth) over its domain is preferred. 


11.1.7 Fourier Series in Higher Dimensions 


It is important to generalize the Fourier series to more than one dimen- 
sion. This generalization is especially useful in crystallography and solid-state 
physics, which deal with the three-dimensional periodic structures of atoms 
and molecules. To generalize to N dimensions, we first consider a special 
case in which an N-dimensional periodic function is a product of N one- 
dimensional periodic functions. That is, we take the N functions f(z) 
[j =1,2,--- ,N] with period L;: 


f(a) a os cB) Prine /Ls j= 1,2,--- yl 


n=—Cco 


348 11 Fourier Series 


Let us define F(r) by the product of all the N functions f(a;) 
F(r) = fO (a) FO (a2) fF (en) 


= oo SS ne a cl) 2) 2. - WN) » e2ti(miai/Lit-+nnen/Ln) 


=>" Cyet*", (fis) 


where we have used the following new notation: 
C. = ce) @) ed oy, 
k = 2n(nj/L1,n2/L2,---nnw/Ly), 


ie (1,22,°°° sens 


We take (11.18) as the definition of the Fourier series for any periodic function 
of N variables. The definition of the coefficient Cy can be developed for a 
general periodic function F'(r) of N variables: 


1 
F(r) = S- Cre*™ <> Ch= V sp F(rje* dg, (11.19) 
k 


where V = [1 L2---Ly determines the smallest region of periodicity in N 
dimensions. When N = 1 (11.19) obviously reduces to the Fourier series in 
one dimension. 


Remark. The application of (11.19) requires some clarification regarding the 
region V of the integral. In one dimension, the shape of the smallest region of 
periodicity is unique, being simply a line segment of length L. In two or more 
dimensions, however, such regions can have a variety of shapes. For instance, 
in two dimensions, they can be rectangles, pentagons, hexagons, and so forth. 
Thus, we let V in (11.18) stand for a primitive cell of the N-dimensional 
lattice. This cell in three dimensions, which is important in solid-state physics, 
is called the Wigner—Seitz cell. 


Recall that F'(r) is a periodic function of r. This means that when r is changed 
by R, where R is a vector describing the boundaries of a cell, then we should 
get the same function: F(r+ R) = F(r). This implies that the periodicity of 


11.1 Basic Properties 349 


F'(r) requires the vector k to take only restricted directions and magnitudes. 
In fact, when replacing r in (11.19) by r + R, we have 


F(r+R)= = Cheih (r+) — 3 (ci#R : Oye") 
k 


k 
which is equal to F'(r) if 
FR _1 ie, k-R=2nz x (integer). (11.20) 


Equation (11.20) is a key relation in determining the allowed directions and 
magnitudes of the vector k. In one-dimensional cases, the inner product re- 
duces to k- R = (22n/L) + L = 27n; thus (11.20) obviously holds true. In 
three dimensions, the vector R is represented as R = mj a, + m2a2q + m3a3, 
where m1, m2, and m3 are integers and a;, a2, and az are crystal axes, 
which are not generally orthogonal. Hence, condition (11.20) is satisfied when 
k = n,b; + ngb2 + n3b3, where n1, n2, and ng are integers and by, be, and bs 
are the reciprocal lattice vectors defined by 


2 
bee 1™(a2 X G3) b= 
a\: (a2 x a3) 


27 (az x ai) 
ESS yy ee OE 
a\- (ag x a3) ai: (a2 x a3) 


3 3 3 
k-R= (dom . So mja; = So nimjb ‘Qj; 
i=1 j=l tJ 


and the reader may verify that b; -a; = 276,;. Thus we obtain 


k-R= on) mn; = 2m x (integer). 
j=l 
Exercises 
1. Expand the following functions in Fourier series: 
(i) f(«) = sinaz on [—7,7], where a is not an integer. 
(ii) f(a) =sinaz on [0,7], where a is not an integer. 
Solution: It is straightforward to obtain the results: 


2 ka : 
() snas= Zones 9 (ee 
n=1 


350 11 Fourier Series 


Bah oti 1 — cos an 
(ii) sinaz = 


cos 2nx 
~ + ays soa] 
1+ cosan s pene +1)x 
ug — (2n+1)?° 


+ 2a & 


n=0 


2. Expand the functions f(#) = cos x on [0,7] in a Fourier sine series. 


8 SS nsin2nz 
Solution: cosz = — et 
olution =, Tai & 


3. (i) Find the Fourier series of f(a) = x on the interval [—7, 7]. 


1 1 1 
(ii) Prove that the identity =1 5 | = pee, 
Solution: 
‘ o me 7) Cs Das ; 
(i) f(z) = “2 a sinna. 


n=1 


(ii) If we substitute « = 7/2 in the series, we obtain 
mn wa 2(-1)"*! _ ong | oe ae 
= i => [en28 Be es eae 
2 2d ma Bo ee ye 


which obviously gives the desired result. & 


4. Expand the function f(x) = x? into the Fourier cosine series on the do- 


1 


ee . 


Co 
1 
main [—7,7] and then prove that y = 
n 


 4(-1)” 
Solution: Straightforward calculations yield 2? = se steele cos nz. 
n 


n= 
By substituting « = a and x = 0, we obtain the desired equa- 
tions. dé 


5. Determine both the cosine and sine series of f(a) = x? — x defined on the 
interval [0,1]. Which series do you suppose converges more quickly? 
Solution: We may set the even and odd extensions of f(x) over 
[—1, 1], respectively, as 


11.2 Mean Convergence of Fourier Series 351 


fiz) =a? =a for—1<2¢< 1, 


and 
-e+ea for -l<2x<0, 
fe(x) = 


e—ax for 0<a<1l. 


It follows that f,(a) is smoother than f.(x); namely, f.(x) has a 
discontinuity in its derivative at +n. This implies that the sine se- 
ries converges more rapidly than the cosine series. In fact, straight- 
forward calculations yield the sine series 


fo(x) == sin(n7x) 


and the cosine series 


faz) = ae S- { 3 ao" + = [1 — ( ini} cos(n7x). 


The continuity of f,(x) and its first derivatives leads to Fourier 
coefficients that decay as 1/n?, whereas the continuity of f.(z) 
coupled with the discontinuity in f’,(a) leads to Fourier coeffi- 
cients that decay as 1/n?. & 


11.2 Mean Convergence of Fourier Series 


11.2.1 Mean Convergence Property 


We know that Fourier series are endowed with a specific class of convergence 
called mean convergence (or convergence in the mean). This converging 
behavior is expressed by an integral: 


“6 2 
f(x) - ye eye | dx =O; (11.21) 


n=—N 


lim 


r 
N—-co 0 


Equation (11.21) applies regardless of the continuity and smoothness of the 
function f(x), as far as f(x) is square-integrable. 


352 11 Fourier Series 


Remark. From the viewpoint of Hilbert space theory, the relation (11.21) 
comes from the completeness property of the set of functions {e’"**} in the 
sense of the norm in the L? space. The L? space is a specific kind of Hilbert 
space that is composed of a set of square-integrable functions f(a) expressed 


by , 
/ | f(x) |?da < oo. 


The inner product (f,g) and the norm ||f|| of elements f,g € L?, respec- 
tively, are given by 


a= f S06 x)dz and fll = (ff) = [iste ) Pde. 


The mean convergence of the Fourier series |i.e., the equality in (11.21)] holds 
even when the integrand in (11.21) has a nonzero value at discrete points of 
x. This comes from the fact that the definition of the mean convergence is 
determined through integration, and that a finite number of discontinuities 
of the integrand do not contribute to the result of its integration. This is 
explained schematically in Fig. 11.5, in which we find 


f(x):  acontinuous function, 
g(a): aseries that converges uniformly to f(z) 
except at a point of discontinuity x = a. 
g(a): aseries that converges uniformly to f(z) 
except at points of discontinuity xz = a1, a@2,a3,°°-. 
As shown in Fig. 11.5, these three functions are distinct from one another. 


However, if we integrate the squared deviation between two of them followed 
by taking the limit n — oo, we have 


f(x) 8, (x) : 8, (X) 


nincreases 


\ 


(a) (b) 


Fig. 11.5. Sketches of a continuous function f(x), a series of functions gt (x) 
converging to f(a) except at a emit and a similar series of functions g? x ) 


having several discontinuities. Series {gh (x)} and {g(a )} both converge in the 
mean to f(x) 


11.2 Mean Convergence of Fourier Series 353 


» N 
im f |f(e) —9M(x)Pax = tim f |f(e) — 92(x)Pax =0. (11.22) 
This is because the area surrounded by two of them vanishes with n — oo, i.e., 
the area right below (or above) points of discontinuity are zero owing to their 
discreteness. Thus, the series gg ax ) and g(x ) both converge to f(a) in the 
mean regardless of their discrepancy from f(a) at points of discontinuity. 


11.2.2 Dirichlet and Fejér Integrals 


It is pedagogical to give an alternative exposition of mean convergence of 
Fourier series, which is based on the two important concepts: Dirichlet’s 
integral and Fejér’s integral. 

Consider the partial sum Sy(x) of the Fourier series of f(a) expressed by 


N 
Sn(a) = > C einks 
n=—N 
and its arithmetic mean 
1 
on(x) = Way (hot 1 + -+ Sy). (11.23) 


After some algebra, we obtain their integral representations as given below 
(see Exercises 1 and 2 in 11.2.2 for references). 


@ Dirichlet integral: 


dt. (11.24) 


oe! aks _ ycos( Nkt) — cos{(N + 1)kt} 
7 xf ita) 1 — cos(kt) 


@ Fejér integral: 


r/2 n2 N+] kt 
—-— dt. 11.25 
Cat (N+1 we sin? skt ( ) 


Remark. Note the distinctive difference between the convergence of Sy and 
that of oy. Whereas liny_.. Sv = S implies oy — S, the converse does 
not generally hold true; in fact, oy may converge even when Sy diverges. 
A typical example is the case of the numerical sequence u, = (—1)", where, 
Sn = ))Un does not converge because Soy = 0 and S2n+41 = 1, whereas 
On = >_Sn/(N + 1) converges to 1/2. 


354 11 Fourier Series 


By putting f(a) = 1 in (11.25), we have the following notation: 


@ Dirichlet kernel: The function 


1 sie Seely 
Dy(t) = : Z 11.26 
NO = Wy sin” kt a 
is called the Dirichlet kernel, which satisfies the identity: 
i xf Dy(t)dt. (11.27) 
A J—x/2 


The derivation of the identity (11.27) is straightforward. When f(x) =1, we 
have f(t+2) =1, co =1, and c, = 0 (|n| > 1), which obviously yield Sy = 1 
for arbitrary N and thus oy = 1. Substitute this into (11.25) to obtain the 
identity (11.27). Figure 11.6 plots the behavior of Dy(t) with increasing N; 
it shows maxima at t = 0,+A,+2,,---, and the magnitude of the maxima 
become singular when N — oo. 

From (11.25) and(11.27), we arrive at the key relation 


d/2 

on(a) - fle) => | |, {i(t+2) ~ Fe} Dual (11.28) 
—/2 

If f(x) is continuous (piecewise, at least), the integral in (11.28) can be made 

arbitrarily small by taking a sufficiently large N (see Exercise 3 below). To 

be precise, there exists an m for each ¢ > 0 such that 


Fig. 11.6. Dirichlet’s kernel Dy (t) defined in (11.26) 


11.2 Mean Convergence of Fourier Series 355 
N>m => |on(x) — f(x)| <e. (11.29) 


This clearly means that oy (x) converges uniformly to f(x) if f(a) is contin- 
uous. 

As is shown later, the result (11.29) immediately yields the mean conver- 
gence of the Fourier series to f(x). 


11.2.3 Proof of the Mean Convergence of Fourier Series 


We are now in a position to prove the mean convergence property of Fourier 
series. 

The function ov(a) can be expressed as a trigonometric polynomial, 
since it consists of N’s trigonometric polynomials S9,51,--- , Sx as given by 
(11.23). Hence, (11.29) implies the existence of a trigonometric series that con- 
verges uniformly to f(a). [This is simply Fejér’s theorem (see Sect. 11.3.2).] 
Thus we have 


N 
on(2) = S- apr anen 
n=—N 


where all the coefficients {7,} have to be determined. 
We now make use of the fact that for any choice of {7}, the inequality 
2 


Xr r 
{ w2| 
0 0 


holds true with the Fourier coefficients {cp} of f(a). (See the discussion in 
Sect. 11.2.4 for the proof.) Taking the limit N — oo yields 


N 


f(x) - > acer 


n=—N 


2 
dx 


N 
f(x) - i» Cee 


n=—N 


2 
Xr Xr 


: 2 : 
_ > 
Jim ; | f(x) — on(a)|° dx Jim , 


N 
f(x) - s c,eirhe 


n=—N 


dx. (11.30) 


Let f(x) be continuous (piecewise, at least). Then the left-hand side vanishes 
owing to the uniform convergence of oy(x) to f(z) at continuous points x of 
f(a) (A finite number of discontinuous points of f(x) makes no contribution 
to the integral.) Eventually, we come to the desired conclusion: 


‘gi 2 
f(x) - ye cne™**| dx =0, 


n=—N 


lim 


Xr 
N—-oco 0 


which is a restatement of (11.21). 


356 11 Fourier Series 
11.2.4 Parseval Identity 


The mean convergence property of Fourier series can be represented by a 
more concise expression, called the Parseval identity. We first note the 
main conclusion of this subsection and then go on to its proof. For simplicity 
of notation, we use the following short form: 


Xr 
xf fee eae = (ha). 


@ Parseval identity: 
A necessary and sufficient condition for the mean convergence of the 
Fourier series of f(a) is given by 


Co 


(f, f) = ee lealee 


n=—Cco 


which is called the Parseval identity. 


To prove the above statement, we assume f(x) to be square-integrable and 
consider the total squared error of f(x) relative to the series of exponential 


functions: 
1 r 
En=-~ 
- a 


whose variables are N and the sequence {7,,} consisting of complex numbers. 
Term-by-term integration of (11.31) yields 


7 2 
f(z) - x yne*"| da, (11.31) 
n=—N 


N N 
Ret =— >, whe) = Fo Gerry 

n=—N n=—N 

N . + 

as ye vIn (Cues our) 

m,n=—N 

N N 

=(,f)- S2 hen tmeh)+ S> hon 

n=—N n=—N 

N N 
n=—N n=—N 


Here we have used the orthonormality of imaginary exponentials, (een, 
Benes ) = bmn; aS well as the definitions of the Fourier coefficient c, = 


(f,e’"**). Note that (f, f) appearing in (11.32) is nonnegative because 


11.2 Mean Convergence of Fourier Series 357 


(ff) = xf | f(a) |?dax > 0. 


Hence, Ey becomes minimal when y,, = c,, and its minimum value reads 
N 


min{€v} =(f,f)- >> lenl?. (11.33) 


n=—N 
We are now ready to complete our proof. Recall that the mean convergence 
of the Fourier series for f(a) is defined by 


N 


f(x)- se Cy,eree 


n=—N 


d 
lim 


2 
im, fh dx = 0. (11.34) 


From (11.31) and (11.33), we see that the definition of the mean convergence 
(11.34) is rewritten as 


Jim min{Eyv} =0, (11.35) 
or equivalently, 
(Ff)= Do lenl?. (11.36) 


Relation (11.36) is thus a necessary and sufficient condition for satisfying 
the mean convergence of the Fourier series to f(x). Since Parseval’s identity 
applies to any square-integrable function f, Fourier series for the functions f 
surely converge in the mean to f(a). 


11.2.5 Riemann—Lebesgue Theorem 


As by-products of the argument in 11.2.4, we obtain the following two impor- 
tant properties regarding the Fourier series expansion. The first is the Bessel 
inequality 


N 
GS > leak (11.37) 
n=—N 


This is obtained from the fact that min{€y} given in (11.33) is nonnegative. 
Here we can let N — oo in (11.37), because the right-hand side of (11.37) 
forms a monotonically increasing sequence that is bounded by its left-hand 


side. Then we obtain < 


Gap es SS Nel (11.38) 

We further note that the series on the right-hand side of (11.38) necessarily 

converges, since it is nondecreasing and bounded from above. Consequently, 
we arrive at the second important property to be noted: 

lim cp, = 0. (11.39) 


n—-oco 


Separating the real and imaginary parts in (11.39), we eventually find the 
second point to be noted: 


358 11 Fourier Series 


@ Riemann—Lebesgue theorem: 
If f(x) is square-integrable on the interval [0, A], then 


r 
lim f(x) cos(nkx)dx =0, lim f(x) sin(nka)dx = 0. 


Exercises 
1. Derive the expressions (11.24) and (11.25) regarding the Dirichlet and 


Fejér integrals, respectively. 
Solution: From the definition of cy, the partial sum Sy(x) yields 


its integral form: 
N 1 r ; : 
oe xf fer a . inka 
=—N Jo 


x N 
= ol 04 3 a aa 


1 A-«& 


= —inkt 
se A f(t+2) ees € a (11.40) 


The finite series of exponential terms reads 


N 2N 1 — et(2N+Dkt 
~inkt _  _,—iNkt inkt __ —iNkt 
y e€ =e s € =e oe 
1 etkt 
n=—N n=0 


cos(Nkt) — cos{(N + 1)kt} 
1 — cos(kt) 


Substituting this in (11.40) yields Dirichlet’s integral (11.24). 
Moreover, its arithmetic mean reduces to Fejér’s integral (11.25) 
as demonstrated by 


on(x) = — al {So +S, +---+ Sw} 
1 an 1 —cos{(N + 1)kt} 
= oee- t+ dt 
(Nadel. A! cote) 
1 Oe sin? N+" kt 
= ——__ Pog) ae 11.41 
(N+1)A ioe rey ikt ay 


11.2 Mean Convergence of Fourier Series 359 


In the last line of (11.41), the interval of the integration from 
[—x,A — a] to [—A/2, A/2] is replaced by taking account of the 
periodicity of the integrand. & 


2. Prove that oy(a) uniformly converges to f(x) by postulating the conti- 
nuity of f(x). 
Solution: Recall that the continuity of f(x) allows us to deter- 
mine a 6 that satisfies 


jr—2'|<d => |f(x)-f(2’)|<e (11.42) 
for an arbitrary small ¢ to be positive. Further, owing to its con- 
tinuity, the function f(x) is bounded as |f(a#)| < M with an ap- 
propriate constant MM. We divide the range of integration given 
in (11.28) as ier = Pat set SR? and use the inequality 
(11.42) to obtain the middle term: 


6 6 
/ {lt 2) ~ f(@)}Dy (dat < i} _lele+2) ~ f(@)| Du (tat 


5 
<e| Dn(t)dt < eX(11.43) 
—5 


and 
=, d/2 
| ( J sf (F(¢-+2) ~ f(e)}Du (at 

-r/2 Js 
5 pM sin? N41 py 

< (= +f “Sane + OD Orpen a 
5 d/2 dt 

. (fs +f ) em (N + 1) sin?(k6d/2) 


2MX 
~ (N+ 1)sin?(kd/2) 
From (11.28), (11.43), and (11.44), we obtain 
2M 
(N +1) sin?(kd/2) 


Taking the limit N — oo and fixing the small quantity 6, the 
second term vanishes. We thus conclude that 


Jim lon (x) — f(x)|=0. & (11.45) 


(11.44) 


lon(a) — f(a)| < e+ 


360 11 Fourier Series 


11.3 Uniform Convergence of Fourier series 


11.3.1 Criterion for Uniform and Pointwise Convergence 


We know that the Fourier series of f(a) converges in the mean to f(x) as far 
as f(x) is square-integrable. However, the mean convergence of the Fourier 
series provides no information as to its uniform convergence. In order for 
the Fourier series to converge (uniformly or pointwise) to the original function 
f(a), several conditions regarding continuity and periodicity of f(x) have to 
be satisfied. These are formally stated in the following two theorems: 


@ Uniform convergence of Fourier series: 
The Fourier series of a continuous, piecewise smooth, and periodic func- 
tion f(x) converges to f(x) absolutely and uniformly. 


@ Pointwise convergence of Fourier series: 
The Fourier series of a piecewise smooth and periodic function f(z) 
(continuous or discontinuous) converges to: 


(i) f(a) at any point of continuity, and 


(a +0) + f(x — 0) 
D) 


(ii) f 


at any point of discontinuity. 


Our main concern in this section is to prove these two theorems, and we follow 
this by demonstrating several important features of Fourier series that occur 
at discontinuous points of f(z). 


Remark. Observe that the above theorems are consistent with the conclusion 
of the Dirichlet theorem given in Sect. 11.1.2; the latter says that a Fourier 
series representation becomes identical to f(a) provided that f(a) is periodic, 
continuous, and further smooth (piecewise, at least). 


11.3.2 Fejér theorem 


The theorems given in the previous subsection clearly exhibit sufficient con- 
ditions for the Fourier series to converge. It is pedagogical to compare them 
with the Fejér theorem: 


@ Fejér theorem: 
Any continuous and periodic function f(a) with a period X can be re- 
produced by an infinite trigonometric series 


11.3 Uniform Convergence of Fourier series 361 


N 


* > inka} __ 
Jim, f(z) De me 0 for all a, (11.46) 


with an appropriate choice of the set of expansion coefficients {7p }. 


At first glance, Fejér’s theorem appears to ensure the uniform convergence of 
the Fourier series. However, this is not the case at all; the sequence of the 
optimal coefficients {y,,} satisfying (11.46) cannot in general be replaced by 
the Fourier coefficients {c,} defined by 


1 inka 
=~] f(axe dx. 
r Jo 


In fact, even when f(a) is continuous and periodic, its Fourier series may 
diverge at discrete points, as is expressed by 


N 


f(x)- SS ees 


n=—N 


Jim =oo at some points x. (11.47) 


Hence, Fejér’s theorem does not guarantee the uniform convergence of the 
Fourier series representation. Equation (11.47) also suggests that the conti- 
nuity and periodicity of f(x) are only necessary but not sufficient conditions, 
for the uniform convergence of its Fourier series to the original function f(z). 


11.3.3 Proof of Uniform Convergence 


We are now in a position to prove the criterion for uniform convergence of 
the Fourier series )7~ .. cne’”* to f(x). The proof that is presented below 


is based on the mean convergence property of Fourier series. Recall that the 
mean convergence of Fourier series is expressed by 


2 


if f(e)— >> cne™*| dz =0. (11.48) 
In general, the relation 
b|_N : 
Jim, , d,tale) dz = 0 


means 


362 11 Fourier Series 


if and only if the infinite series }>° 9 un(x) converges uniformly to a certain 
function of x within the range of integration [a, b]. Therefore, in order to obtain 


the desired equality 
co 
se C eink 
nr 


n=—Co 


for any « € [0,A], we must seek the condition that the infinite series 
yr ene’ converges uniformly to some function of x [not necessarily 
to f(x)]. Thus, we rewrite the Fourier coefficient c, as 


Cn = uh f(x Jerks doy 
1! viper 
_ —inkz / —inkg 
7 aK [F(@)e lo = ma. P{z)e ae 


=— Lf peaerined a (11.49) 
Hie dee’ Pe eink’ 


where c’, is the Fourier coefficient of the derivative f’(2). Here f(x) is assumed 
to be periodic, e.g., f(0) = f(A) and kA = 27. We further assume that f(z) is 
continuous and smooth (piecewise, at least) on the interval [0, A]. Then, f’(x) 
is continuous (or piecewise continuous) to yield Parseval’s identity: 


ay lore) 
[ f@Pa= y lara 
0 n=—co 
where A is a constant. Observe that 
5 | = eallee Ienl (11.50) 
n=—0co i we ink - = nk , 


From the Schwartz inequality, it follows that 


Co [oe) 


So eV ee ES aaa ISP 
A CO 
=“) > " (11.51) 


n=—Co 


It follows that 57° ,(1/n?) is convergent (See the remark below). Hence, 
from (11.50) and (11.51), we see that °°. |cn| converges. This implies 
that 9°. c,e’”** converges uniformly to a certain function on [0, ] since 
|cne’"**| < |e,| for all n on [0, A]. (See Sect. 3.3.1 for the criteria for uniform 
convergence.) This completes our proof. 


11.38 Uniform Convergence of Fourier series 363 


Remark. That the series }**~_,(1/n?) is convergent is verified as follows: set 
Agr+1_; to be a partial sum consisting of the first 2*+! — 1 terms. Then we 


have 
1 1 1 1 
Ana =1+ (3% } a)+(gted a) + 


+ leet + eae 


1 1 1 : 
SE Tega rach ga eee omy ve 


EfA\*® 1 -(1/2)**2 
<E() “Tay <? 


This means that Age+1_, for any k is bounded above. Furthermore, the se- 
quence (A,,) is monotonically increasing. Hence, (Am) converges in the limit 
of m — oo, which completes the proof. 


11.3.4 Pointwise Convergence at Discontinuous Points 


This subsection gives an account of the second criterion in Sect. 11.3.1, which 
is restated below. 


@ Pointwise convergence at discontinuities: 

When a function f(x) is piecewise continuous and piecewise smooth, its 
Fourier series converges pointwise to { f(a+0) — f(a —0)}/2 at a point of 
discontinuity. 


This theorem can be proven in the following manner. It readily follows from 
(11.24) that the partial sum of the Fourier series Sy (a) is expressed by 


1 A-# ei(N+E)kt_ 9-i(N +5) kt 
S == +t dt. 
wel=aqf flere sin (Lt) 
We rewrite this as 
i ee eae iN kt 
S(z) = sen aE 
OES Sogn) 9g, AO eng 
ae ie gt Nkt 
t -e’ dt 
75 0 a aca ry eT 
1 fr-® eight i 
= t) 4 nerds 
mf, Wetd+ie-d sree 


364 11 Fourier Series 


Here we have set t — —t in the second integral in the first line. Further, 


ar as 0 0) fr-® eibkteiNet 
Sn(t) = ox / g(t) dt + PERT E | eee 


DIN __ sin(Zkt) 
(11.52) 
where we have introduced the notation 
eight 
t= t) — 0 t 0 . 
att) = (Fle +8) ~ fle+0) + fle-#) - He - Oh SR 


The second term in (11.52) can be simplified via the relation 


\-2 eight eiNkt 
a vee a =1 
in sin ( 


(See Exercise 3 in Sect. 11.3 for its an Substituting this into (11.52), 
we get 


A-x 
Sn(x) = ox | g(t)eN dt + SS (11.53) 


If the integration term in (11.53) vanishes with N — oo, we will success- 
fully obtain the desired result. In fact, when g(t) is piecewise continuous in the 
interval [—x, \ — x], the integral in (11.53) vanishes owing to the Riemann— 
Lebesgue theorem (see Sect. 11.2.5). The remaining task is, therefore, to prove 
the piecewise continuity of g(t) on [—x, A—2], which is actually verified through 


the following discussion. 
it/2 


When t 4 0, f(t) is piecewise continuous and sin(t/2) and e are 
bounded; thus g(t) is surely piecewise continuous. When t = 0, we have 
fle+t)— f+), f@—t)— f(e—-9) t idee 
th= . ~DQe*2 
att) { ; a ; fan 
so 
t 7 —t)-f(r—- 
lim g(t) = {im AA cass oe AC deh grr Aca eee A > -2. (11.54) 
t—0 t—0 t t0 t 


The first and second terms in (11.54) are the derivatives of f(x) on the right 
and left, respectively. Since f(t) is assumed to be piecewise smooth, f’(t) is 
piecewise continuous; thus both terms in (11.54) exist. This indicates that 
the limit lim:—.o g(t) exists, so g(t) is piecewise continuous within the interval 
[—2,A — a]. 

Consequently, we can conclude from (11.53) that 


Fle+0)+ fie—9) 
2 y] 
which implies the pointwise convergence of the Fourier series to [f(a +0) + 


f(a — 0)]/2. 


sim, Sv) = 


11.3 Uniform Convergence of Fourier series 365 
11.3.5 Gibbs Phenomenon 


If a function f(x) has discontinuities in the defining region, its Fourier series 
does not reproduce the behavior of f(a) at points of discontinuity. In other 
words, the partial sums of a Fourier series cannot approach f(x) uniformly 
in the vicinity of a point of discontinuity. Furthermore, close to discontinu- 
ous points, the Fourier series inevitably overshoots the value of the original 
function to be expanded. The size of the overshoot is proportional to the 
magnitude of the discontinuity. This overshoot is known, which as the Gibbs 
phenomenon is nicely illustrated with the Fourier series for the step function 


+1. for 0<e<4, 


x 
—1 for gy <<A, 


which is a periodic square wave with period \. The complex Fourier coefficient 


Cn, reads 
1 r/2 ; Xr ; 
Cn == | Ee inke dy =, : eT inks dp 
A 0 r/2 


0 (n = even) 


sin(2n — 1)k 
: 11. 
— 2n-—1 oe”) 


Figure 11.7 shows f(a) for 0 < x < X for the sum of four, six, and ten 
terms of the series. Three features deserve attention. 


(i) There is a steady increase in the accuracy of the representation as the 
number of included terms is increased. 


(ii) All the curves pass through the midpoint of f(x) = 0 at the points of 
discontinuity « = nX/2 (n = 0,+1,+2,---). 


(iii) In the vicinity of z = n\/2, there is an overshoot that persists and shows 
no sign of diminishing. 


As more and more terms are taken, the small oscillations along each hori- 
zontal portion get smaller and smaller and, except for the two outer terms of 


366 11 Fourier Series 


1.5 


1.0 


ie 


Fig. 11.7. Gibbs phenomena for the Fourier series of a step function. The partial 
sums of one, five, and fifty terms the right-hand side of (11.55) are given 


each portion closes to the discontinuities, eventually disappear. Even in the 
limit of an infinite number of terms, there is still a small overshoot. This over- 
shoot is nothing but what we call the Gibbs phenomenon, which results in 
the fact that the Fourier series cannot have uniform convergence at a point of 
discontinuity. 


11.3.6 Overshoot at a Discontinuous Point 


Owing to Gibbs phenomena, a Fourier series representation is highly unreliable 
in the vicinity of a discontinuity. We now consider the resulting degree of error 
when we represent a function f(a) by a Fourier series having a discontinuity. 

The maximum overshoot can be evaluated analytically through the follow- 
ing procedure. Let us consider a finite sum of the Fourier series in the complex 
form 


which yields 


ik ee _ sin [(N + 3)kt] 
= ay - f(t+ x) Ky (t)dt, Ky(t) = —_ (kt) 


Sn (x) (11.56) 


We consider the behavior of Sy(2) in the vicinity of a discontinuity at x = 2p. 
We denote the jump of f(a) at this discontinuity by Af and the jump of its 
finite Fourier sum by ASy: 


11.3 Uniform Convergence of Fourier series 367 
Af = f(ao+e)—f(fo—e), ASn = Sn(xo+e)—Sw@o—e), 


where ¢ is infinitesimal. We then have 


1 A-—XLo-E 1 A—xXo+E 
ASyx = xf Flt-+29 +e)Ky (tat > f f(t +29 —«)Ky(t)dt. 


—XQ-E —xo+eE 


Owing to the periodicity of the integrand f(t+«x)K y(t), we replace the range 
of integration as follows: 


A-E Ate 
Asv=5f  ft+aotakw(ode-5 [fem - Ky (tet 


se 


Hence, we have 


ASn = . (/ +f) f(t +20 she! €)K ny (t)dt 


A-E A+e 
(/ +f f(t + ro — €)K ny (t)dt 


=5 ff [te +2046) $04 a ~2)] Kwa 


Ae 
+f f(t tao +e) — f(t +29 —©)] Kw(t)dt. (11.57) 


The integrand of (11.57) gives zero for all values of t except near t = 0. Close 
to t = 0, the integrand has a somewhat large value because of (i) the jump of 
f(t+ 20) at t = 0 and (ii) the significant contribution of A(t) in the vicinity 
of t = 0. Hence, we can confine the integration to the small interval (—d, +6) 
for which the difference in the square brackets in (11.57) is simply Af. It now 
follows that 


~ Af sin {(N + $) kt} 4Af (> sin{(N + 4) kt} 
ae 7 dt ~ i dt, 
sin 5kt vA Jo kt 
(11.58) 
where the sine in the dominator was approximated by its argument because 
of the smallness of t. 

The value of AS; depends crucially on the interval 6, since the integrand 
in (11.58) rapidly alternates its sign as t increases. The reader may find the 
plot of the integrand in Fig. 11.8, where it is shown clearly that the major 
contribution to the integral comes from the interval [0,A/(2N + 1)], where 
A/(2N +1) is the first zero of the integrand. Hence, if the upper limit is larger 
than A/(2N + 1), the result of the integral will clearly decrease, because in 
each interval of length A, the area below the horizontal axis is larger than that 
above. Therefore, if we are interested in the maximum overshoot of the finite 


368 11 Fourier Series 


Fig. 11.8. The integrand of (11.58) 


sum AS'y, we must set the upper limit equal to A/(2N + 1). It follows that 
the maximum overshoot is 


(ASN) max aed dt 


AAT oon sin(N + 3)kt 
ods kt 


-Sw+ py f= a = not sinz |. 
r 2°Jo 8& (N+5)k T Jo @ 


~ LAT9IAS. 


We thus conclude that the finite (large-N’) sum approximation of the discon- 
tinuous function overshoots the function itself at a discontinuity by about 18% 
in this case. This means that the Fourier series tends to overshoot the posi- 
tive corner by some 18% and to undershoot the negative corner by the same 
amount. The inclusion of more terms (increasing r) does nothing to remove 
this overshoot but merely moves it closer to the point of discontinuity. 


Exercises 


1. Let f(x) be absolutely integrable and form the Fourier series of f(a) in 
the interval (—7,7). Show that the convergence of its Fourier series at a 
specified point x within the interval depends only on the behavior of f 
in the immediate vicinity of this point. (This result is referred to as the 
localization theorem.) 


Solution: We use the integral formula for the partial sums 


11.3 Uniform Convergence of Fourier series 369 


sin mu 


Sp(a) = Z ‘ I IS ies (9) U 


x fe 


T 
sin mu 
——~d I I: 
LP 4 f(atu )¥sin(u/2) uU+ tit to, 


where we have set m = n+ (1/2). Here 6 is an arbitrarily small 
positive number, and [;, > are the integrals over the intervals 
[6,7] and [—7, —d], respectively. On these intervals, the function 
1/(2 sin(u/2)] is continuous (since |u| > 6) and, therefore, the func- 


tion 
f(a@+u) 
2sin(wu/2) 


is absolutely integrable. It then follows from the Riemann—Lebesgue 
theorem that the integral 


o(u) = 


1 T 
=— | o(u) sin mudu 
TS6 


approaches zero as m — oo. The same is true of Jz. Thus, whether 
or not the partial sums of the Fourier series have a limit at the 
point x depends on the behavior of the integral 


sin mu 
———~d 
=e Ie UN inte /2) . 
as m — oo, which involves only the values of the function f(a) in 


the neighborhood |x — 6, x + 6] of the point x. This completes the 
proof. & 


2. Let f(x) = —log |2sin(2/2)|, which is even and becomes infinite at 7 = 
Qkm (k =0,+1,+2,--+). 


(i) Show that f(x) is integrable. 


(ii) Calculate the Fourier series of f(a). 


(iii) Derive the identity: log 2 = 1 — (1/2) + (1/3) — (1/4) 4 


Solution: 

(i) The given f(x) equals zero at x = 7/3 and is 27-periodic. 
Hence, to prove the integrability of f(a), it suffices to show 
that it is integrable on the interval [0, 7/3]. Clearly we have 


pee x € */3 x cos(a/2) 
= | Jasin =| de =e] (2sin 5) / ee 
| og |2s1n 5} x = €log | 2sin 5 ++ : 2sin(a/2) 


370 11 Fourier Series 


where we have dropped the absolute value sign, since 2 sin(x/2) > 
1 for 0 < a < 2/3. As ¢ > 0, the quantity ¢log[2 sin(e/2)] ap- 
proaches zero, which is verified by using ’ H6pital’s rule (see 
Sect. 1.4.1), whereas the last integral converges since the in- 
tegrand is bounded. (Recall that lim, x/[2 sin(x/2)] = 1.) 
Thus, — cite log |2 sin Z| dx exists, i.e., f(a) is integrable on 
the interval [0, 7/3]. 


(ii) Since f(a) is even, we have b, = 0 (n=1,2,---) and 
2 (* eee 
— --[ log (2sin =) cosnadx (n=0,1,2,---). 
TT 0 2 


For n # 0, integrating by parts and then applying l’Hopital’s 
rule, we get 


tof" si 2 
es sie | pCa) ) ax (n = 1,2,---), 
nt Jo sin(x/2) 


and then use the identity 2 sin nz cos(%/2) = sin[n+ (1/2)Ja+ 
sin[n — (1/2)]” to obtain 


An = 


1 f*sinfn+(1/2)Je, | 1 f* sin[n — (1/2)|x ¥ 
i. dx i; d. 


nm 2 sin(«/2) ' nt 2 sin(a/2) 


1 
=—. (n=0,1,2,---). 
nr 


For n = 0, we have 


7 


D: uf® 2 
a =-= | log (2sin =) dx =—= | (log2 + logsin 5) de 
T Jo 2 tT Jo 2 
= mlog2+ f log (sin 5) de. 
0 2 


The last integral, denoted by J, reads 


n/2 n/2 t t 
[= 2 | log(sin t)dt = 2 | log (2 sin — cos 5) dt 
0 0 2 2 


n/2 t n/2 t 
= 7log2+ 2 | log (sin 5) dt + 2 | log (cos 5) dt. 
0 2 0 2 


The substitution t = 7 — wu gives fe log[cos(t/2)|dt = 
Seva log[sin(u/2)|du, which implies that I = 7 log 2 + 2J, i.e., 
I = —7 log 2. Consequently, ap = 0. 


11.4 Applications in Physics and Engineering 371 


(iii) Since the function f(x) is obviously differentiable for « 4 2kn 
(k = 0,+1,+2,---), it follows that 
— log |2sin 5] = cos « Soeee pee ic, (11.59) 
2 2 3 
for « A 2kr (k = 0,41, +£2,---). Setting x = 7 in (11.59), we 
obtain the desired result. & 


3. Show that 


A-# Gi(N+3)kt 
—— dt = ix. 11.60 
Dag ea eee) 


Solution: 


Recall an alternative form of Sy(a) given in (11.40): 


1 A-«£ 


N 
oe. f(t+2) ( S- so) dt. (11.61) 


n=—N 


Sn (x) 
Setting f(t) = 1 into (11.61) and (11.52) and comparing them, we 


have 
1 fA7® eight eiNkt 1 pt ( N 
0+ - : dt = / ek) de 
iA J_, sin (Skt) Xr Je» pss 


11.4 Applications in Physics and Engineering 


11.4.1 Temperature Variation of the Ground 


The most important applications of Fourier series expansions in the physi- 
cal sciences are in solving partial differential equations that describe a 
wide variety of physical phenomena. In this section, two typical examples of 
such applications are presented, while more rigorous discussions on partial 
differential equations are given in Chap. 17. 

First, we consider the temperature variation of the ground exposed to 
sunlight. The temperature at a depth of x meters at time t, denoted by u(z, t), 
is known to be determined by the diffusion equation 

Ou O7u 


372 11 Fourier Series 


Here, the proportionality constant « is called the thermal conductivity and 
its magnitude on the ground is roughly estimated at k = 3.0 x 10~° m?/s. We 
will see below that the Fourier series expansion provides a means of solving 
equation (11.62) and clarifying the physical interpretation of its solution. 

Suppose that the temperature of the land surface, u(a = 0,t), changes 
periodically with a period T; the period T may range from a day to a year. 
It is then reasonable to express u(x,t) by the Fourier series 


= ; 27 
t)= 3 inwt ey ey 
u(a, t) Due (x)e (~ 7) 
Substitute this into (11.62) to obtain 
Cp, 
inwen(x) = K dg? 


which implies 


Here we have chosen the solutions that behave as |c,(x)| — 0 in the limit of 
x — oo. In order to obtain the zeroth term co(x), we note that 


dco (x) 4 
dx?’ 
and thus 
co(x) = Ag + Box. 
Owing to the condition that lim,—.. |co(x)| = 0, we see that Bo = 0 and 


Ao = const. As a result, we obtain 


u(x,t) = Ap +25_ Ane %* cos (nut — n+ bn), (11.63) 
n=1 
where 
NW 
mee 2K 


and the constants A, and ¢, are determined by the t-dependence of the 
surface temperature u(x = 0, t). 

Note the presence of the parameter a,, in the general solution (11.63). 
It indicates that a wave component with the period T/n has the following 


11.4 Applications in Physics and Engineering 373 


features: (i) decay of the wave amplitude by e~°”* with an increase in x, and 


(ii) a phase shift by a,x relative to the surface temperature u(x = 0, t). 

Let us quantify the actual value of a,,. For this, we consider the case of 
T = 1 day (i.e., 60 x 60 x 24s) and assume monochromatic variation of the 
surface temperature given by 


2 
u(0,t) = 15 + 5cos (Fr) °C. 


Comparing this with (11.63) with « = 0, we get Ap = 15, Ay = 5/2, and 
Ay = 0 for n > 2. Then, since 


a 2x 3.14 bseee 
“1 12x (3.0 x 10-6) x (60x 60x24)” 


we have , 
u(x,t) = 15 + 5e~*** cos (7 — 3.5) : 


A three-dimensional plot of u(a,t) in the x-t plane is shown in Fig. 11.9. We 
observe that at depths greater than 1m, the temperature variation is almost 
in antiphase to that at the surface (a = 0) and the amplitude decreases 
considerably. 


11.4.2 String Vibration Under Impact 


The second example is the vibration of an elastic string subject to an impact 
force in a local region. Consider the case of a piano wire under an impact force 
applied by a hammer. Suppose that an impulse I is applied at the position 
x =a of asuspended string with length @ and mass density p. The vibrational 
amplitude of the string, denoted by u(a,t), is governed by the wave equation 


Pu OPu 
— = CT. 11.64 
ae © Aa ey 
The string is initially assumed to be stationary, i.e., 
u(x,t =0) =0. (11.65) 


The initial velocity of the line element at x is denoted by v(x). Then, the law 
of the conservation of momentum states that 


a pu(x)dx = I, (11.66) 
0 


where 


u(x) =V-d(a%—a), (11.67) 


374 11 Fourier Series 


Temperauture variation [°C] 


0 1 2 3 4 
Date t 


Fig. 11.9. Temperature variation u(z,t) of the underground below xz meters on t 
days 


with an appropriate constant V. From (11.66) and (11.67), we have V = I/p. 
Furthermore, since 


Ou 
v(x) — Ot sae 
we have 5 F 
UL 
= —d(x—a). 11.68 
l= 9-9 (11.68) 


11.4 Applications in Physics and Engineering 375 


Under the two initial conditions (11.65) and (11.68), the general solution of 
(11.64) is given by 


u(a,t) = Se Ay sin kina sin (Wnt + On), (11.69) 
n=1 
where oe 
kn = G en Che 


The constants A, and ¢,, in (11.69) are again determined by the initial 
conditions. First, imposing the condition u(#,t = 0) = 0 into (11.69) implies 


A, sin gn, = 0 for all n, (11.70) 


owing to the linear independence of {sink,«}. Next, it follows from (11.69) 
that 
Ou 


ot 


= £ 
= te ApWn COS bp Sink, x = —0(x — a). 
t=0 n=1 p 
Mutiplying both sides by sink,,x and then integrating yields 
° y) 
AmWm COS én | sin? k,,vdz = —sinkma for all m. (11.71) 
0 p 


From (11.70) and (11.71), we finally obtain 


dn =0 and A,= 


2 sink,a for all n. (11.72) 
plwy 
The second expression in (11.72) implies that the position « = a that sat- 
isfies sink,a = 0 yields A, = 0; i-e., the nth vibration mode is not excited by 
the impulsive force applied at 7 = a that satisfies sin k,a = 0. In contrast, if 
we apply an impulsive force at x = a satisfying | sink,,a| = 1, the correspond- 
ing nth mode will have a large vibrational amplitude, as is actually the case 
inside a piano. 


12 


Fourier Transformation 


Abstract Fourier transformation is an effective tool for confirming the dual na- 
ture of a complex-valued function (as well as a real-valued one). Furthermore, the 
transformation enables us to measure certain correlations of a function with itself 
or with other functions; thus a Fourier transform can be applied to probability the- 
ory, signal analysis, etc. In this chapter we also provide the essence of a discrete 
Fourier transform (Sect. 12.3), which refers to a Fourier transform applied to a dis- 
crete complex-valued series. A discrete Fourier transform is commonly used in the 
numerical computation of Fourier transforms because of its computational efficiency. 


12.1 Fourier Transform 


12.1.1 Derivation of Fourier Transform 


The properties of Fourier series that we have already developed are adequate 
for handling the expansion of any periodic function. Nevertheless, there are 
many problems in physics and engineering that do not involve periodic func- 
tions, so it is important to generalize Fourier series to include nonperiodic 
functions. A nonperiodic function can be considered as a limit of a given 
periodic function whose period becomes infinite. 

Let us write Fourier series representing a periodic function f(x) in complex 


form: ae 
Fars eee (12.1) 
with the definition k = 2n7/X, in which 
1 2 
Cn = ~ f(a)e—*** dx 
Ad=% 
We then introduce the quantity 
2 
Ak = An. (12.2) 


378 12 Fourier Transformation 


From the definition (12.2), the adjacent values of k are obtained by setting 
An = 1, which corresponds to (A/27)Ak = 1. Therefore, multiplying each 
side of (12.1) by (A/27)Ak yields 


[oe) 


fa) = SS ole AR, (12.3) 


n=—Cco 


where 


x» 
=. A a 1 2. —tika 
cy(k) = 5, ch = ae f(ax)e dx. 


In the limit as \ — oo, the ks are distributed continuously instead of discretely, 
i.e., Ak — dk. Thus, the sum in (12.3) becomes exactly the definition of an 
integral. As a result, we arrive at the conclusion 


c(k) = im ey(k) = = i f(a)e—** dx (12.4) 
and = 
f(x) =| c(k)e’** dk. (12.5) 


Further, by defining F'(k) = /2mc(k), equations (12.4) and (12.5) take the 
symmetrical form given below, known as the Fourier transform or Fourier 
integral representation of f(x). 


@ Fourier transform: 
The Fourier transform of f(a) is defined by 


Oe res / ” F(a)en de. (12.6) 


@ Inverse Fourier transform: 
The inverse Fourier transform of F'(k) given above is defined by 


1 C thx 
(ea =i F(k)e*** dk, (12.7) 


We often write the expressions (12.6) and (12.7) in simpler form: 
F(k) =F[f(a)] and f(x) = F~*[F(k)]. 


Observe that F'(k) as well as f(x) are, in general, complex-valued functions 
of the real variables k and x, respectively. Yet, if f(a) is real, then 


F(-k) = F*(k), 


which gives two immediate corollaries (proofs are left to the reader): 


12.1 Fourier Transform 379 


@ Fourier integral theorem: 
1. If f(z) is real and even, F'(k) is real. 


2. If f(a) is real and odd, F(k) is purely imaginary. 


12.1.2 Fourier Integral Theorem 


Our derivations of the Fourier transform and its inverse transform, (12.7) and 
(12.6), have been ambiguous from a mathematical viewpoint. For developing 
exact derivations and clarifying the conditions for the infinite integrals in 
(12.7) and (12.6) to converge, the following theorem is of crucial importance: 


@ Fourier integral theorem: 
If f(a) is piecewise smooth and absolutely integrable, then 


ley satay ee ee Ot ee) 
=| | [fo cosu(e ~ tae ae (12.8) 


ZA 


Remark. The theorem is valid for each fixed x, so x can be considered a 
constant insofar as the integrations are concerned. 


Before starting the proof of the theorem, we note that (12.8) reduces to the 
form of (12.7) and (12.6) when «x is a continuous point of f(x). To see this, 
we make use of the identity 


é 1 fs. 
| cos u(a — t)du = al elt) dy, (12.9) 
0 2 Jue 
Since (12.8) reads 
Dee . 
f(x) = Paes -{ (eat [ cos u(a% — u)du, (12.10) 
— oo TT —oco 0 


we substitute (12.9) and (12.10) to obtain 


f(x) = = | mda f f(je dt = = | F(u)e™* du, 
QT J 8s _& 27 


—oco 


where ; a 
ip eeet —itu 
F(u) = oe he: f(je "dt. 


These results are clearly equivalent to the forms of (12.7) and (12.6). 


380 12 Fourier Transformation 


12.1.3 Proof of the Fourier Integral Theorem 


The proof of the Fourier integral theorem is based on the following two 
lemmas: 


@ Lemma 1: If f(x) is piecewise smooth for all x € R, then 


b 

lim i HG eee =e te sion Pel 
E00 Jo G 2 

@ Lemma 2: If f(z,t) is a continuous function of t for a < t < b and 

Tit WOW Me f(a,t)dx exists and converges uniformly to a certain function 

g(t) in the interval, then g(t) is continuous in the interval and 


[ooe= fo if F(t) a= fo [see dx. 


Note that f(0+) in Lemma 1 denotes the limiting value of f(x) as « tends 
to zero through positive values. The proof of Lemma 1 is left to Exercise 2. 
Lemma 2 follows from the fact that uniform convergence allows us to inter- 
change the order of limiting and integration procedures (see Chapter 3 for 
details). 

We are now ready to prove the Fourier integral theorem expressed by (12.8). 


Proof (of the Fourier integral theorem): Let f(x) be piecewise 
smooth and absolutely integrable. Consider the integral 


/. f(t) cos u(a — t)dt. 


Since |cos u(a—t)| <1, 2 convergence of this integral is ensured by 
our hypothesis that [°° | f(¢)|dt converges, and since this conclusion is 
independent of u and z, ate convergence is uniform for all u. Therefore, 
in view of Lemma 2, we can interchange the order of integration in 


b foe) 
= | / f(t) cos u(a — bat du 
0 —0oo 
to obtain 


[= io [ Heosute oi a= [- RO 


We now decompose this into four integrals: 


le a 


sinbl@— 4) ede, (42.11) 


xz—t 


12.1 Fourier Transform 381 
where M is taken to be so large that the first and the last integrals 


in (12.11) are less in absolute value than some prescribed ¢ > 0. By 
changing variables, taking u = t — x, we can write the third integral 


in (12.11) as 
M-<cz .: 
sin b 
| = “ f(a tu)du. 
0 


U 


In view of Lemma 1, this tends to 7f(x + 0)/2 as b > oo. Similarly, 
the second integral tends to 7f(a — 0)/2. Therefore, by taking M 
sufficiently large, we obtain 


a {f(a +0) + f(@ —0)] 
2 


jim I< + 2e, 


or equivalently, 


fe a f(t) cosu(a — t)dt| du m[f(z+0)+ f(e—O)) 
0 —oo 


2 


This completes the proof of the theorem. & 


12.1.4 Inverse Relations of the Half-width 


In practice, we often encounter functions f(a) having a sharp peak at a specific 
point, say x = 0. The width of the peak of such a function is possibly correlated 
with the width of the peak that is exhibited by the resulting Fourier transform 
F'(k) = F[f(«x)]. A typical example of this phenomenon is seen by considering 
the Fourier transform of a Gaussian function f(a) = ae~*” with a,b > 0, ie., 


Bs —K2/(4b) poo 
F(k) a sei. en Ot” pike op = rae 


We substitute y = x + ik/(2b) to evaluate the integral as 


a ble tik /(20))? gr = i err d= i 


— 2 __-k?/(40) 
F(k) TR 

which is also Gaussian. It is noteworthy that the width of f(x), which is 
proportional to 1/2, is in inverse relation to the width of F(k), which is pro- 
portional to Vb. Therefore, increasing the width of f(x) results in a decrease 
in the width of F(k). In the limit of infinite width (a constant function), we 
get infinite sharpness (the delta function). In fact, denoting the widths as Ax 
and Ak, we have AxAk ~ 1. 


eo ble +ik/(2b)]? gap 


and we get 


o] 


382 12 Fourier Transformation 


@ Inverse relation of the half-width: 

When f(x) consists of a single peak whose width is characterized by Az, 
its Fourier transform F'(k) is also a single-peak function with a width Ak, 
which yields Ax Ak ~ 1. 


For the second example, we evaluate the Fourier transform of a box function 
defined by 
_ fb, if |zl <a, 
F(z) = in if |a| >a. 


From the definition, we have 


Sy wl ae pipe a8 Oe gases _ 2ab /sinka 
FW) = Fe f fee ae= se fe n= = ( a } 


Observe again that the width of f(x), Ax = 2a, is in inverse relation to the 
width of F(k), which is roughly the distance between its first two roots, ky 
and k_, on either side of k = 0: Ak = ki —k_ = 27/a. In addition, if a > ~, 
the function f(a) becomes a constant function over the entire real line, and 
we get 


Otherwise, if b > oo and a — 0 in such a way that 2ab [the area under the 
graph of f(«)| remains fixed at unity, then f(x) approaches the delta function 
and F'(k) becomes 


2ab sinka 1 
F(k) = lim lim = , 
( ) a0b>00 J27 ka V2 


12.1.5 Parseval Identity for Fourier Transforms 


If F(k) and G(k) are Fourier transforms of f(x) and g(x), respectively, we 
have 


/ “fale dex 


= [. laf. F(hje an x inf. or jean’ | de 


=| ax | are) {= | eo teael 
—oo —oo T Joo 


[- dk F (k) [- dk! G* (k!)5(k! — k) 
1 co 


= oe F(k)G"(k) dk, (12.12) 


12.1 Fourier Transform 383 


or similarly, 


re f(a) g(x) dx = ~ a F'(k)G(—k) dk. (12.13) 


In particular, if we set g(x) = f(a) in (12.12), we have 


[itera z caf iF k)[2dk. (12.14) 


Here |F(k)|? is referred to as the power spectrum of the function f(z). 
Equation (12.14), or the more general (12.13), is known as the Parseval 
identity for Fourier integrals. 


Remark. A sufficient condition for interchanging the order of integration in 
(12.12) is the absolute convergence of the integrals: Lie F(k)e~***dk and 


[CO ran, 


Parseval’s identity is very useful for understanding the physical interpretation 
of the transform function F'(k) when the physical significance of f(x) is known, 
as illustrated in the following example: 


Examples The displacement of adamped harmonic oscillator as a function 
of time is given by 


sey= {0 for t <0, 


e~*/7 sinwot for #> 0. 


The Fourier transform of this function is given by 


0 [eve) 
F(w) = i 0x ewats f e~*/™ sin wote~™* dt 
—0o 0 
a 1 oo I” [e i(w—wo)t—-t/T __ e i(w+wo)t 4 dt 
21 


1 1 1 

~ 2\wtuw-i/tT w-uw—i/t) 
The physical interpretation of |F'(w)|? is the energy content per unit frequency 
interval (i.e., the energy spectrum) while | f(¢)|? is proportional to the sum of 
the kinetic and potential energies of the oscillator. Hence, Parseval’s identity, 


expressed by 
a FOr — = fF w)|?dw, 


shows the equivalence of these two alternative specifications for the total en- 
ergy to within a constant. 


384 12 Fourier Transformation 
12.1.6 Fourier Transforms in Higher Dimensions 


The concept of the Fourier transform can be extended naturally to more than 
one dimension. For example, in three dimensions we can define the Fourier 
transform of f(x,y, z) as 


1 —ik,x2 i —tk,z 
Elke he eae) f(x,y, ze het eh e—*8e*dgdydz (12.15) 
and its inverse as 
= 1 ikea ,ikyy pik, 2 
f(@,y,z) = oars | | | Pee kukJe eve? dk dkydk,. (12.16) 
Denoting the vector with components kz, ky, kz by k and that with compo- 


nents x, y, z by r, we can write the Fourier transform pair (12.15), (12.16) as 
follows: 


@ Fourier transforms in three dimensions: 


= 1 ~ik-r 
(8) = Baap f fear, 
f(r) = aap | Fe ak 


It is pedagogical to evaluate the Fourier transform of a function f(r) under the 
condition that the system possesses spherical symmetry, i.e., f(r) = f(r). We 
employ spherical coordinates in which the vector k of the Fourier transform 
lies along the polar axis (0 = 0). We then have 


dr =r*sin@drdddé and k-r=krcos8@, 


where k = |k|. The Fourier transform is then given by 


F(k) = Pate / f(rje®? ar 
1 


ee ae t " 10 sin 6 —ikr cos 6 | 
el, rr sr) | sin e 


The integral over 9 may be straightforwardly evaluated by noting that 


d 
do 


—ikr cos 6 —ikr cos @ 


= ikrsin 6 e 


12.1 Fourier Transform 


Therefore, 


‘ 1 oo js j e—ikr cos 6 O=n 
FO) = oom free |, 


1 ei sin kr 
= ae 2r? f(r) € kp ) dr. 


385 


Remark. A similar result may be obtained for two-dimensional Fourier 
transforms in which f(r) = f(p), ie., f(r) is independent of the azimuthal 


angle ¢. In this case, we find 


F(k) = ii pf (p)Jo(kp)dp, 


0 


where Jo(x) is the zeroth order Bessel function. 


Exercises 


1. Show that if f(a) is piecewise continuous over (a,b), then 


b 
a f(a) sin €adx = 0. 


Solution: If f has a continuous derivative, this is easily proved; 


we integrate by parts to obtain 


b é b b 
Yh f(x) cos €xdx = Lo | -3f f' (x) sin Exda, 


a 


which tends to zero as € — oo since the integral on the right-hand 
side is bounded. If f is not integrable, let p be a continuously 


differentiable function such that Hb | f(a) — p(w) |dx < e. Then 


b b 
i} [F(«) — p(a)] coséede| < / Lf(e) — pla)| |cos €a|de 


b 
< f |fle)- pala << 


independently of €, and as the preceding discussion gave us 
th p(x) cos Exdx — 0, it follows that i f(x) cos €xdx — 0 as well. 


The proof that Hi p(x) sin €adxz — 0 is similar. & 


386 12 Fourier Transformation 


2. Show that f Se ae 
0 az 2 


Solution: If we substitute \ = 27, x = 7 into (11.60) and note 
that the integrand is an odd function, it follows that 


sin (2540) 
| saa (12.17) 


Applying the result of Exercise 1 noted above to the function 
[((2/u) — 1/sin(u/2)] (which is bounded in 0 < u < 7), we have 


lim sin (Fu) (2 ay} t= 0. (12.18) 
0 


n—00 2 wu sin(u/2 


Summing (12.17) and (12.18), we obtain 


™ 2sin nitty 


lim —_2 —du=n. 
nc Jo U 
Changing variables and letting t = (2n + 1)u/2, we set 
(2n+1)7/2 .: t 
lim dt = =. 
noo 0 t 2 


We already know that ae (sint)/t dt tends to a limit as M — oo 
which completes our proof. d& 


3. Show that 
in A 
lim i, f(e) de = 5 4(0+) forb>0 


A-oo Jo 


whenever f is piecewise smooth. 


Solution: Observe that 


[oP Bar = f 10n% Mae +f Ei) ete Ca Pr er 
anna A) FOP) i ands 


From the result of Exercise 1, the last integral tends to zero as 
A — ©, since the integrand is piecewise smooth in the interval 
0 < a < b. It also remains bounded in this interval since, as x 
tends to zero, [f(a) — f(0+)]/x tends to f’(0+). From Exercise 2, 
the other integral tends to the desired value. d& 


12.2 Convolution and Correlations 387 
12.2 Convolution and Correlations 


12.2.1 Convolution Theorem 


In the application of the Fourier transform, we often encounter a product 
such as F'(k)G(k), where each of two functions is the Fourier transform of a 
function f(x) and g(x), respectively. Here, we are interested in finding out 
how the inverse Fourier transform of the product denoted by 


F [F(k)G(A)], 
is related to the individual inverse function 
FF (k)] = f(z) and F~*[G(k)| = 9(z). 


To begin with, we introduce a key concept called convolution and then state 
an important theorem that plays a central role in the discussion of the matter. 


@ Convolution: 
The convolution of the function f(a) and g(x), denoted by f x g, is 
defined by 


1 Co 
0 WE i f(u)g(a@ — u)du. (12.19) 


The convolution obeys the commutative, associative, and distributive 
laws of algebra, i.e., if we have function fi, fo, fs, then 


fixfe=feeh (Commutative). 
fi * (fo * fs) = (fi * fa) * fa (Associative). (12.20) 
fi * (fo + fs) = (ft * fo) + (fi * fz) (Distributive). 


We are now ready to prove the following important theorem regarding the 
product F(k)G(k) of two Fourier transforms. 


@ Convolution theorem: 
If F(k) and G(k) are Fourier transforms of f(x) and g(x), respectively, 
then 
F(k)G(k) = F[f * g]. (12.21) 


388 12 Fourier Transformation 


Proof It follows from the definition of the Fourier transform that 


F(k) = ae / ” F(a)e**aa, 


1 - ; 
G(k) = Vin I. g(aje“"** de, 
which yields 


F(k)G(k) = =f ie f(a)g(a' )e~* 4") dada’. (12.22) 


Let «+2’ = u in the double integral of (12.22) transform independent variables 
from (a, 2’) to (a, u). We thus have 


dda!’ = 


where the Jacobian of the transformation is 


On Ox 
Oa). ree ou || |. 410 a4 
O(z,u) | de" ae'| |O1] ~ 
Ox Ou 


Then (12.22) becomes 


F(k)G(k) = ~ i, [- f(a)g(u— x)e~**"dadu 


= sf ee fe F(e)o(u—2)ae} dtu 
Seiya. we (12.23) 


12.2.2 Cross-Correlation Functions 


There are several important functions related to the convolution, which are 
called correlation functions (see below) and auto-correlation functions 
(see Sect. 12.2.3). 


@ Cross-correlation function: 
The cross-correlation of two functions f and g is defined by 


jee et poate 
de) = =). f*(x)g(a + z)dz. (12.24) 


12.2 Convolution and Correlations 389 


Despite the apparent similarity between the cross-correlation function (12.24) 
and the definition of convolution (12.19), their uses and interpretations are 
very different: the cross-correlation provides a quantitative measure of the 
similarity of two functions f and g since one is displaced through a distance 
z relative to the other. 


Remark. Similar to the convolution, the cross-correlation is both associative 
and distributive. Unlike the convolution, however, it is not commutative. 


We arrive at an important theorem by considering the Fourier transform of 
(12.24): 


@ Wiener—Kinchin theorem: 
The Fourier transform of the cross-correlation of f and g is equal to the 
product of F*(k) and G(k) multiplied by V27, ie., 


Flc(z)] = C(k) = F*(k)G(k). (12.25) 


x 
oO 
Kien 
= 
Il 


C(k) = sf dee {Tf F*(a)gle+x)ac} 


= ssf ere { ae [etn ™ach. 


Making the substitution u = z+ 2 in the second integral, we obtain 


C(k) = ee ie dx f*(x) a g(uje FOU) dy 
V20 ie { V2n I-00 


“fag frown} {5 [soem] 
= F*(k)G(k). & (12.26) 


It readily follows from the definition (12.24) and the theorem (12.25) that 


1 co § co 2 
c(z) = Ta / C(k)e***dx = / F*(k)G(k)e"** dk. (12.27) 
Then, setting z = 0 gives us the multiplication theorem 


i f*(x)g(a)dx = i F*(k)G(k)dk. (12.28) 


390 12 Fourier Transformation 


Further, by letting g = f, we arrive at the following identity: 


@ Plancherel identity: 
A function f(a) and its Fourier transform F'(k) are related to one another 
by the identity 


HG Fe Wee ie |F(k)[2dk, (12.29) 


= 123) 128) 


which is called the Plancherel identity. 


Plancherel’s identity is sometimes called Parseval’s identity, aims to the anal- 
ogy with Fourier series. 


12.2.3 Autocorrelation Functions 


Particularly when g(a) = f(a), the cross-correlation function c(z) is referred 
to specifically as follows: 


@ Autocorrelation function: 
The autocorrelation function of f(a) is defined by 


pee ee 
a(z) = ae (x) f(a + z)dz. 


Using the Wiener—Kinchin theorem (12.26), we see that 


a(z) = ie A(k)e*** dk = oe Vin F* (k)F (ke dk 


1 oe F 
=— F(k)/e"**dk. 
= (k) Pe 


This implies that the quantity |F'(k)|?, called the power spectrum of f(z), 
is the Fourier transform of the autocorrelation function as formally stated 
below. 


@ Power spectrum: 
Given f(a), we have 


FOP = ef ” alzje de, 


where F'(k) and a(z) are, respectively, the Fourier transform and the auto- 
correlation function of f(x). 


12.3 Discrete Fourier Transform 391 


This result is frequently made use of in practical applications of Fourier trans- 
forms. 


12.3 Discrete Fourier Transform 


12.3.1 Definitions 


The present section includes several topics associated with numerical com- 
putation of Fourier transforms. Generally, in computational work, we do not 
treat a continuous function f(t), but rather f(t,) given by a discrete set of 
t,’s. (For now, we assume that a physical process of interest is described in 
the time domain.) In most common situations, the value of f(t) is recorded 
at evenly spaced intervals. In this context, we have to estimate the Fourier 
transform of a function from a finite number of its sampled points. 

Suppose that we have a set of measurements performed at equal time 
intervals of A. Then the sequence of sampled values is given by 


fe=flie haha (= 0)1,2.5-+ NM =I), (12.30) 


For simplicity, we assume that N is even. With N numbers of input, we can 
produce at most N independent numbers of output. So, instead of trying to 
estimate the Fourier transform F'(w) in the whole range of frequency w, we 
seek estimates only at the discrete values w = w, with n = 0,1,:--,N—-1. 
By analogy with the Fourier transform for a continuous function f(t), we may 
define the Fourier transform for a discrete set of f, = f (tx) (k = 0,1,--- N-1) 
as below. 


@ Discrete Fourier transform: 
The discrete Fourier transform for a discrete set of fy given by (12.30) 
is defined by 


ee! ee 
eS x SOG —2rikn/N 

F, = F(wn) = yD, fae = SE fre ey) 

k=0 k=0 
with the definition 
27m 
y= — =0,1,::-,N-—1). IDR 
W WA (n=0 ) (12.32) 


Note that F,, is associated with frequency w,,. Of importance is the fact that 
in (12.31), n can be any integer from —oo to oo, whereas & in (12.31) runs 


392 12 Fourier Transformation 


from 0 to N — 1. The latter restriction is due to the fact that F;, is periodic 
with a period of N terms. In fact, for any integer n such that O<n< N—1, 
we have 


fy, = Fn n=f, aN ere 


as readily follows from (12.31). 


12.3.2 Inverse Transform 


Given the discrete transform F;,, we can reproduce the time series f;, with the 
aid of the inverse relationship: 


@ Inverse of discrete Fourier transform: 
The discrete Fourier transform of a set { f;,} satisfies the relation 


N-1 
RS aE (12.33) 
=O) 


Proof For the proof, it suffices to observe that 


N-1 
-2nin(k-e yn _ JN (k=k’), 
2 : - { 0 (otherwise). (12.34) 


(see Exercise 1 in Sect. 12.3). Then, from (12.31) and (12.34), we have 


N-1 1 N-1N-1 
i z . / 
S Fe2tink JEN << fe eee )/N 
N 
n=0 n=0 k=0 
N-1 
1 


= fe: Non = fer. & 


Note that the only differences between expressions (12.31) and (12.33) for 
F,, and fx, respectively, are (i) changing the sign in the exponential, and (ii) 
dividing the answer by N. This means that a computational procedure for 
calculating discrete Fourier transforms can, with slight modifications, also be 
used to calculate the inverse transform. In addition, we see from the inverse 
transform that only N values of the frequency w,, are needed and that they 
range from 0 to N — 1, just as with the discrete time tx. 


12.3 Discrete Fourier Transform 393 


12.3.3 Nyquest Frequency and Aliasing 


In the above discussion, we have taken the view that the index n in (12.31) 
varies from 0 to N. In this convention, n in F,, and k in f, vary over exactly 
the same range, so the mapping of N numbers into N numbers is manifest. 
Alternatively, since the quantity Ff, given in (12.31) is periodic in n with 
period N (ie., Fn = Fw+n), nin F, is allowed to vary from —N/2 to (N/2) — 
1. In the latter convention, the discrete Fourier transform and its inverse 
transform read, respectively, 


N/2-1 1 N/2-1 
F= oe jer and fe = a a Fy e2rtkn/N | (12.35) 
k=—N/2 n=—N/2 
Emphasis is placed on the fact that in (12.35), the upper bound of the 
summation is not N/2 but (N/2) — 1. This ensures the count of w, to 
N. Indeed, the periodicity of F,, in n with the period N implies that the 
descretized frequency w, = 27n/(NA) is also periodic in n with N. Hence, 
the two extreme values of wp, 1.e., 


9 


BIA 


T 
W_oN/2 = ae and wy/2 = 


contribute to F,, as given in (12.31) in the same way. These two indistinguish- 
able frequencies are known as the Nyquist critical frequencies. 


@ Nyquist critical frequency: 
A Nyquist critical frequency is defined by 


T 
We Dn’ 
where A is the sampling interval: ty = kA (k =0,1,---,N-—1). 


The Nyquist critical frequency has the following peculiarity. Suppose that we 
sample a sine wave of the Nyquist critical frequency, expressed by 


f(t) = sin(wct), 
at the sampling interval A. Then we have 


TT 


fe = f(te) = sin (wet, + 8) = sin E 


(k=0,1,---,N—1), 


(kA + 6)| = sin(ka + 8) 


where @ is determined by the initial condition: f(0) = sin@. Then, the 
sampling becomes two sample points per cycle: sin @ and — sin @. 

The above arguments further suggest that descretized frequencies w,, above 
(and below) w. are identified with w,-y (and wr+n). This phenomenon, 
peculiar to discrete sampling, leads to the following important consequence: 


394 12 Fourier Transformation 


@ Aliasing: 

When a continuous function f(t) is sampled with an interval A, all of the 
power spectral density lying outside of the range [—we,w-) with w. = 7/A 
is moved into that range. Owing to a phenomenon called aliasing. 


Through discrete sampling, therefore, any frequency component outside of the 
range [—w.,w-) is falsely translated into that range. 


Example Suppose that two continuous waves exp(iw,t) and exp(iw2t) are sam- 
pled with the same interval A. Then, if w. = w,; + 2w,, we obtain the same 
samples, since 


exp(twot,) = exp(iwit,) x exp(+2iw.t,) 


= exp(iw ,t,) x exp(+2k7i) = exp(iwyt,), 


where t, = kA (k = 0,1,---,N—1). Hence, a sinusoidal wave having a 
frequency lying outside the range [—w,,w.) appears the same as the sinusoidal 
wave whose frequency is within the range. 


Remark. The way to overcome aliasing is to (i) know the natural bandwidth 
limit of the signal — or else enforce a known limit by analog filtering of the 
continuous signal, and then (ii) sample at a rate sufficiently rapid to give at 
least two points per cycle of the highest frequency present. 


12.3.4 Sampling Theorem 


We present below a famous theorem that is useful in certain applications of 
the discrete Fourier transform. 


@ Sampling theorem: 

Suppose that a continuous function f(t) is sampled at an interval A as 
fr = f(kA). If its Fourier transform satisfies the condition that Fw) = 0 
for all |w| > w. = 7/A, then we have 


_< sin [w,(t — kA)] 
A 2s, li pea 


This theorem states that if a signal f(t) that is in question is bandwidth- 
limited (i.e., F(w) = 0 for |w| > |wo|) with a certain preassigned frequency wo, 
then the entire information content of the signal can be recorded by sampling 
it at the interval A = 7/wo. 


12.3 Discrete Fourier Transform 395 


Proof Given a continuous function f(t), we express it by the inverse Fourier 


transform as 
1 le) ; 
f(t) = Vin is. F(w)e™* dw. 


From hypothesis, F'(w) vanishes at w > |w.| so that 


fi) = se f . Fw)ei*dus, 


which yields 
1 st twtr, 
f(t.) = Te F(wje’"’'dw for th = kA (ke Z). 


Consider the Fourier series expansion of F'(w) as 


F(w) = De ope for |o|< we, (12.36) 


k=—0o 


where the coefficients c; read 
ee I F(w)e!** dw = f (th). (12.37) 
V2T Jw, 


From (12.36) and (12.37), we obtain 


F(w)= yy f(te = for |w| < we. 


k=—00 
Now we define 


H(w) = So f(th)e*"* for all w. 


k=—0o 


While the function H(w) is a periodic function with period 2w,, the F(w) is 
identically zero outside the interval [—w,,w,]. This being so, we can write 


1 |w| < we, 


F(w) = H(w)S(w) with Sw) = {0 |u| > we. 


Thus we have 


F(w) = S> fltxJe™* S(w), 


k=—0o 


396 12 Fourier Transformation 


and its inverse transform reads 
(t) 
we Pele > —oo 
> rad [™ eeeing 
fous V2T Jaw. 


joel = 1) 
- F(te we(t — tr) fs 


fie?" st et dw 


l 


I 


k=—0o 


12.3.5 Fast Fourier Transform 


The fast Fourier transform (often abbreviated by FFT) is an algorithm for 
calculating discrete Fourier transforms and is widely known as a useful tool 
in computational physics. In this subsection, we demonstrate the efficiency of 
this computational method. 

In a typical discrete Fourier transform, one has a sum of N terms expressed 
by 


F, = Wert, (12.38) 


where W is a complex number defined by 


Par, 


Notably, the left-hand side of (12.38) can be regarded as a product of the 
vector consisting of the elements { f;,} with a matrix whose (n,)th element 
is the constant W to the power n x k. The matrix multiplication produces 
a vector whose components are the F;,’s. This operation evidently requires 
N? complex-number multiplications plus a smaller number of operations to 
generate the required powers of W. Thus, the discrete Fourier transform ap- 
pears to be an O(N?) process. 

The efficiency of the fast Fourier transform manifests in the fact that it en- 
ables us to compare the discrete Fourier transform in O(N log, N) operations. 
The difference between O(N?) and O(N logy N) is immense. With N = 108, 
e.g., it is the difference between, roughly, 2s and 3 months of CPU time on a 
gigahertz cycle computer. 

The fast Fourier transform is based on the fact that a discrete Fourier 
transform of length N can be rewritten as the sum of two discrete Fourier 
transforms, each of length N/2. This is easily seen from (12.38) as follows: 


12.3 Discrete Fourier Transform 397 


N-1 
F,= ErnnN F 
k=0 
N/2-1 N/2-1 
= > ern ey, zt S- auc azn Gk eRe 
k=0 k=0 
N/2-1 N/2-1 
= en N TS) fo +w Mc e2tink/(N/2) pee 
k=0 k=0 
= Fo+ WF. (12.39) 


Here W is the same complex constant we defined in (12.38). The F'¢ denotes 
the nth component of the Fourier transform of the sequence (f2;) with length 
N/2 expressed by 


(for) — (fo, fa, fa,--: , fn—2), 


which consists of even components of the original f;,’s. Similarly, the F'° is the 
corresponding transform of length N/2 formed from odd components. Recall 
that F;, is periodic in n with the period N. On the other hand, the transforms 
F* and F° are periodic in & with length N/2. This period-reduction property 
is the origin of the efficiency of the fast Fourier transform as demonstrated 
below. 

Having decomposed F;, into F* and F'°, we can apply the same procedure 
to F° and F° to produce N/4 even-numbered and odd-numbered data: 


N/4—1 N/4-1 
Fe _ Se e2tink/(N/4) fis 4 we y eM INIA) Fe 8 
k=0 k=0 
= Fe 4 WR, (12.40) 
N/4-1 N/4-1 
Fe = ~ e2tink/(N/4) fis 4+wr Se PORN) toe 
k=0 k=0 
= Fe 4 WR, (12.41) 


Here, the F£°, e.g., is the transform of the sequence (f1x+2) given by 


(far+e) = (fa, fe,--+ , fn—2), 


whose length is N/4. We can continue the above procedure until we obtain 
the transform of a single-point sequence, say, 


pRconeneo “Ose = fe for some k. (12.42) 


This implies that for every pattern of log, N e’s and o’s, there is a one-point 
transform that is just one of the input numbers f;,. Therefore, by relating 
all the terms fy, (0 < k < N —1) to log, N patterns of e’s and o’s and 


398 12 Fourier Transformation 


then tracking back to the procedures (12.39), (12.40), (12.41), and (12.42) to 
reproduce F),, we will successfully obtain the discrete Fourier transform F), 
(0<n< N-—1) of the original data f, (0<k< N-—1). 

One may ask a question as to the way we can figure out which value of 
k corresponds to which pattern of e’s and o’s in (12.42). As we demonstrate 
later, this can be achieved by reversing the pattern of e’s and o’s and setting 
e = 0 and o = 1. Then, we have the corresponding value of & in a binary 
expression. This idea of bit reversal can be exploited in a very clever way 
that makes FFTs practical. 


12.3.6 Matrix Representation of FFT Algorithm 


To make our discussion more concrete, we now present an actual FFT pro- 
cedure to obtain the discrete Fourier transform F(n) (n = 0,1,2,3) of the 
original vector data f(k) (k = 0,1,2,3). By definition, F(n) is given in the 
matrix representation as 


Bo W? We We W°7 T f(0) 1 1 1 17 f(0) 
F(1) Wow w2 Ww | | fd) 1 Ww? w3| | f(a) 
F2)|  |wow2wtw®| | f2)| [1 w2 we w?] | fey}? 
F(3) Ww? w3 we Ww? | | £03) 1 w3w2w'| | f(s) 

(12.43) 


where we used the fact that 
. 4 . 
w4 = (e?n/4) = een =1; 
More generally, we have 


wrk = wrk mod(N) 


where the number 
nk mod(N) 


is the remainder when the integer nk is divided by N. The trick involved in 
the FFT algorithm is to decompose the product of the vector and the matrix 
appearing in (12.43) into that of a vector and two matrices: 


F(0) 1W°0 0 10W°® 0 f(0) 
F(2)| _ |1 W20 0 01 0 W®] | f(1) 
F(a)! |0031W! 1 0W? 0 f (2) ee 
F(3) 001W?},01 0 WI Lf() 


The equivalence between (12.43) and (12.44) is verified in a straightforward 
manner. Nevertheless, the reader should pay attention to the fact that in 
(12.44), the order of elements in the vector F(n) is altered from that in the 


12.3 Discrete Fourier Transform 399 


original form (12.43). As we demonstrate later, this altering property of the 
order of F'(n) enables us to compute efficiently the F(n) from f(k) with the 
help of the bit-reversing process. 

The efficiency of FFT can be observed by counting up the number of 
multiplication (and additions) between matrix elements in order to complete 
the matrix operation given in (12.44). First we set 


fi(0) 10 W° 0 7 fF fo(0) 
fil) 01 0 W®} | f(a) 
fil2)| ~~ | 1 0 W?2 Oo | | fla) }’ 
fi(3) 01 0 Wt Lfo(3) 


in which fo(k) = f(k) (k = 0,1,2,3). Then f,(0) is obtained through one 
complex-number multiplication and one complex-number addition, i-e., 


fi(0) = fo(0) + W° fo(2). (12.45) 


We can obtain f;(1) in the same manner as above. On the contrary, to obtain 
fi (2), only one complex-number addition is needed due to the relation W? = 
—W°. In fact, 


fil2) = fo(0) + W? fo(2) = fo(0) — W° fo(2), 


in which the product W° fo(2) was evaluated earlier in the calculation of 
(12.45). Likewise, f;(3) is also computed by only one addition owing to the 
relation W? = —W!. As a consequence, the vector fi(k) (k = 0,1,2,3) is 
calculated through four-times additions and two-times multiplications. 

A similar scenario can apply to the remaining computation: 


F(0) fo(0) 1W°0 0 7S fr(0) 
F(2) fo(1) 1w20 0 || fq) 
FQ)| | fo(2)} fo 0 1 wy | 2) 
F(3) f2(3 0 0 1 W3) Lfi(3) 


) 

) 
Calculation of each number f2(0) and f2(2) requires both one addition and 
one multiplication, whereas for f2(1) and f2(3) only one addition is required 
because of the relations W2 = —W° and W? = —W!. Therefore, the entire 
computation to yield F(n) in the above context requires four-time multiplica- 
tions and eight-time additions. This computational cost is significantly small 
compared with the direct matrix calculation given in (12.43), where 16-times 
multiplications and 12-times additions are needed. More generally, when con- 
sidering the transform F'(k) of the length N = 27, the FFT procedure requires 
the multiplications of Ny/2 times and the additions of Ny times, whereas the 
direct matrix calculation procedure demands N?-times multiplications and 
N(N — 1)-times additions. Thus the superiority of FFT method is consider- 
ably enhanced when N >> 1. 


400 12 Fourier Transformation 
12.3.7 Decomposition Method for FFT 


It is still unclear as to how we can find an appropriate decomposition of general 
N x N matrices as performed in (12.44). To see this, we express the indices 
n and k in terms of two-digit expressions: 


n=2n1 +70, k = 2k, + ko, 


where each 11,70, k1,kq takes the value 0 or 1 [e.g., n = 3 corresponds to 
(no,m1) = (1,0)]. Then, the discrete Fourier transform reads 


1 1 
(11,70) = » s Fol higko WZ Othe rer), 


Now we apply the identity 
W 21 +70) (2k1 +ko) = wirka Ww2noks W 2nritno)ko = W2noks Ww 2nritno)ko 


to obtain 


a 


1 
F(migne) = >, |S, foleasho Wer | Wem tres, (12.46) 
ko=0 Lki=0 


Denoting the sum in the square bracket by f1(no, ko), we have 


1 
fi(no, ko) = y fo(k1, ko) W270", (12.47) 
k1=0 
or equivalently, 
fi (0, 0) = fo(0, 0) TT fo(1, 0)W®, 
fi (0, 1) = fo(0, 1) TT fo(1, 1)W?, 
fil, 0) = fo(0, 0) SIP fo, 0)W?, 
Fis 1) = fo(0, 1) a fo(1, 1)W?. 
This system of equations is expressed in matrix form as 
f.(0, 0) 1 0 W° 0 7 F fo(0,0) 
fi (0, 1) _ 01 0 W? fo(0, 1) 
FCO o> ke AO? 0 fo(1, 0) 
fi(, 1) 01 0 Ww! fo(1, 1) 


12.4 Applications in Physics and Engineering 401 


Similarly, from (12.46) and (12.47), it follows that 


f2(0,0) 1 W°0 0 7 Ffr(0,0) 
f2(0,1) 1W?0 0 fi(0, 1) 
fo(l,0)| ~— |0 0 1 WT | f,(2,0) 
fa(1, 1) 001W3} lAGD 


Hence, we have 
F(ni,no) = f2(no, m1), 


in which the order of no and n, in the parentheses differs on the two sides. 
This indicates that the individual numbers f2(no9,n1) are in order not of 
nm = 2n, + no, but of the numbers obtained by bit-reversing n, which is why 
the bit-reversing process is required to obtain the discrete Fourier trans- 
form F'(n) using FFT, The above discussion also clearly demonstrates the 
way to construct the decomposed product of matrices that makes the entire 
computations a, fast. 


Exercises 


1. Show that 


N-1 a 
3 eoemin(k-k’)/N __ JN ifk=K’, 
= 0 otherwise, 


where k and k’ are integers ranging from 0 to N — 1. 


Solution: The proof for the case of k = k’ is trivial. When k = k’, 


then 
en 2tin(k—k')/N #1 and en 2Tin(k—k’) = 


for any choice of k and k’, so that we have 


N-1 —2rin(k—k’ 
“S en 2tin(k—k)/N Lege ee =0. & 
~ _ e—2nin(k—k’)/N 


12.4 Applications in Physics and Engineering 


12.4.1 Fraunhofer Diffraction I 


In optics, Fourier transformation is a powerful tool to describe an important 
class of wave diffractions, called Fraunhofer diffraction; this refers to the 
diffraction of electromagnetic radiation observed at a point far from a slit or 
an aperture. A Fraunhofer diffraction pattern can be described by using the 


402 12 Fourier Transformation 


wave theory of light, which predicts the areas of constructive and destructive 
interference. 

Let us derive the diffraction pattern produced by a rectangular aperture 
with width a and height b. We assume that both incident and diffracted waves 
can be approximated as being plain waves with wavelength .. In order to make 
this assumption, the diffracting obstacle and the observation point must be 
sufficiently far from the light source so that the curvature of the incident and 
diffracted light can be neglected (see Fig. 12.1). According to elementary wave 
optics, the amplitude of light at R on the screen is given by 


(aperture) 


(screen) 


Fig. 12.1. Configurations of the light source, a recrangular aperture, and the screen 


aaa 1) kL R—-r'| 4.7 
u(R) On Jag u(r’Je dr’. 


Here, k = 27/A, AS’ represents the area of the rectangular aperture through 
which light passes and u(r’) is the amplitude of the incident wave at r’ within 
the aperture: 


u(r’) = Aeik-r’, 


We assume that this incident wave is oriented in the direction of the z-axis. 
Then, the wave vector k is perpendicular to the position vector r’ so that 


u(r’) = A = const. 


Hence, we have 


ikA 
_ d 1 ik| Rr’ lL. 12.4 
u(R) ar We x af dy’e (12.48) 
Set R= (x,y,z) and r’ = (2’,y’,0), where the origin is located at the center 
of the aperture. Under the assumption that z >> |a],|y| and |a],|y| >> |2’|, |y’], 
we have 


12.4 Applications in Physics and Engineering 403 


JR- 1) = Jf22+(e-2/ P+ (y-y')? 


= \R — 222! + yy’) +2 +y” 


ag! + yy! yl? $y” oo! + yy! 


Substituting this into (12.48) yields 


} @ ; b 4 
u(R) = — 4 ier / ee dy! / eo RY dy! 
a M 


27R 
é _ kby 
2bA sep RR 
RS key 
R R 


The light intensity distribution [(R) on the screen is thus given by 


sin(ka) sin(kby) ]7 
ka kby , 


I(R) = |u(R)|? « k?a?b? 


where 


Remember that (sin€)/E§ = 0 at € = tna with integers n = 1,2,---. In 
addition, since k = 27/2, we conclude that 


mx 7 nr 
~ 9a. a op, (m,n=1,2,---), 


which describes the diffraction pattern generated on the screen. 


° 
= 


I(R) =0 at r= 


12.4.2 Fraunhofer Diffraction II 


We next consider the case of a circular aperture with radius a. For convenience, 
we use the polar coordinates defined by x = rcos0, y = rsin@. Then (12.49) 


reads 


u(R) « ay ex oe = a dr’ 
AS’ R 
= re a aX i'r ep ae 0’ + sin @ sin 2 
- R 
0 0 


= e dr’ [ d0’r’ exp = Salis 2 ; 
0 0 R 


To make it consise, we use the following formulae based on the Bessel func- 
tion J,,(x): 


404 12 Fourier Transformation 


2m i 
| eS 8 dd — In Jo(C), | CJo(C) = nJi(n). 
0 0 


4 (FF) 
u(R) « 2na2 +? 7 


kar 


R 
where the explicit form of J;(x) is obtained from the definition of J,(x), 


20 
ue)= (5) ares) 


These give us 


2 


and thus limz.9 Ji(#)/a = 1/2. The first zero of J, (x) is located at x ~ 1.227. 
Therefore, the radius ro of the innermost dark ring on the screen is given by 


karo 0.614 7, 


1227, Wes oro 


a 


12.4.3 Amplitude Modulation Technique 


We conclude this chapter with a discussion regarding the use of Fourier trans- 
formations in an amplitude modulation (AM) technique. This technique 
is used in electronic communication, most commonly for transmitting infor- 
mation via a radio carrier wave. As the name indicates, AM works by mod- 
ulating the vibrational amplitude of the transmitted signal according to the 
information being sent. This is in contrast to the frequency modulation 
(FM) technique that is also commonly used for transmitting sound, but by 
modulating its frequency. 

For AM, we use two kinds of waves: a carrier wave c(t) and a message 
wave m(t) that contains information on the message to be transmitted. For 
simplicity, the carrier wave is modeled here as a simple sine wave written as 


c(t) = C’- cos(wet + ¢-), 
where the radio frequency (in Hertz) is given by w,/(27). C and ¢, are con- 
stants representing the carrier amplitude and the initial phase, respectively, 
and their values are set to 1 and 0. AM is then realized by determining the 
product: 
y(t) = m(t) - c(t), 
whose Fourier transform Y(w) is expressed as 


Y(w) = Flm(t)e@)] 


1 ; elwet ae e Wet 
a m(t)e ”* —__—__dt 
V2 qe 2 
1 
ay [M(w + we) + M(w — we)] (12.49) 


Here M(w) is the Fourier transform of m(t). 


(a) . 


wave amplitudes: c(t), m(f) 


12.4 Applications in Physics and Engineering 


carrier wave c(t) 
message wave m(t) 


SY a eG 
-10 -5 ) 5 10 15 
time t 
(b) 3, 37 
— yt) = e(t)m(t) c(t)y(t) 
wn 
® 2b 2 
8 c 
roa 
E 1b 4b 
i [ 
> 
g 0 0) 
a [ 
2 
= -1F 1b 
8 [ 
E ot -2f 
[ 
[ 
-3 mn | Fea sae { se ee ee 
—5 ) 5 10-5 0 5 10 
time t time t 
2:5 
(c) — Fy 
on F[ c(t)y(t 
3% IF[ c(t)y(t) II 
© 
3 
o 
5 
= 
ra 


Frequency 


405 


Fig. 12.2. Top: A carrier wave c(t) = sin(w-t) with w. = 5.0 and a message wave 
m(t) = 2exp[—(t — to)?/4] with to = 1.5. Middle: The products c(t)m(t) = y(t) and 
c?(t)m(t). Bottom: The power spectra |F[y(t)]|? and |F[e(t)y(t)]|? 


406 12 Fourier Transformation 


The result in (12.49) implies that the modulated signal y(t) has two groups 
of components: one at positive frequencies (centered at +w,) and one at neg- 
ative frequencies (centered at —w.). Figure 12.2 illustrates a carrier wave 
c(t) = sin(w,t) with w, = 5.0, a message wave m(t) = 2exp[—(t —to)?/4] with 
to = 1.5, and the power spectrum of y(t) = c(t)m(t) [ie., w-dependence of 
Y(w)] described by (12.49), together with the associated message wave m(t). 
The frequency shift from w to w +w,, which is clearly evident, facilitates the 
tuning of the frequency of the transmitted signal to the desired value. We are 
concerned only with positive frequencies. The negative ones are mathematical 
artifacts that carry no additional information. 

In order to reproduce the original signal m(t) from the modulated one 
y(t), it is sufficient to multiply c(t) by y(t) and follow that with a filtering 
process. The Fourier transform of the product c(t)y(t) is given as 


F [c(t)y(t)] = Flm(t) cos? (wet)] 
7 sar | 7M t Quip) + M (cw — 2we)) 


We pick up the first term in the last expression and take its inverse transform, 
thus obtaining F~![M(w)] = m(t). 


13 


Laplace Transformation 


Abstract Using the Laplace transform for the mathematical description of a phys- 
ical system considerably simplifies the analysis of its behavior Many useful applica- 
tions and formulas related to Laplace transforms can be found in other textbooks, 
but here we focus on the theoretical background, particularly, on the convergence 
properties of the various forms of Laplace transforms. It is important to note that a 
Laplace transform exists only if the corresponding improper integral, known as the 
Laplace integral, converges. Hence, the convergence of the improper integral must 
be confirmed prior to discussing the Laplace transform of a given function. Thus we 
devote a portion of this chapter to an analysis of the conditions necessary for the 
convergence of Laplace integrals, in contrast to the standard literature that deals 
primarily with the practical applications of Laplace transforms. 


13.1 Basic Operations 


13.1.1 Definitions 


The Laplace transformation associates a function f(x) of a real variable x 
with a suitable function F's) of a complex variable s. This correspondence is 
essentially a reciprocal one-to-one and often allows us to replace a given com- 
plicated function by a simpler one. The advantage of this operation manifests 
particularly in applications to problems of linear differential equations (see 
Chap. 15). We shall see that the Laplace transformation allows us to reduce 
a linear differential equation of f(x) to a certain simple algebraic equation of 
F(s), which yields solutions of the original differential equations more readily 
than other techniques. Furthermore, it turns out that this reduction method 
can be extended to systems of differential equations (ordinary and partial) as 
well as to integral equations, which enhances the importance of studying and 
understanding the Laplace transform. 

To begin with, we define the Laplace transformation operator L that maps 
a function f(a) to a corresponding function Fs): 


408 13 Laplace Transformation 


@ Laplace transformation: 
The (one-sided) Laplace transformation, denoted by the operator L, 
is defined by 


Life = f° e*slwlde = FO), (13.1) 


which associates an image function F'(s) of the complex variable s = 0+ iw 
with a single-valued function f(x) (a real) such that the integral (13.1) 
exists. 


@ Laplace integral: 

The integral given in (13.1) is called the Laplace integral. If the 
Laplace integral exists for a given f(x), the image function F(s) is called 
the (one-sided) Laplace transform of f(x). 


It is important to keep in mind the difference between the Laplace integral 
and the Laplace transform. Namely, the Laplace transform exists only when 
the Laplace integral exists (i.e., converges). Convergence properties of Laplace 
integrals are determined by the value of s and the feature of the function f(x), 
which is discussed fully in Sect. 13.3. In the meantime, we assume that f(x) 
is a function that allows the Laplace integral to converge for certain s. 


13.1.2 Several Remarks 


Below are several important remarks regarding the properties of the Laplace 
transform (13.1). 


1. The definition (13.1) states that for a given F'(s), there is at most one con- 
tinuous function f(a). Nevertheless, it does not determine a unique f(z) 
because if f(a) in (13.1) were altered at a finite number of isolated points, 
F(s) would remain unchanged, as such discontinuous points make no con- 
tribution to the integral. For this reason, we assume in the remainder of 
this chapter that f(a) is continuous except at isloated points. 


2. In order for the integral (13.1) to exist, any discontinuity of the integrand 
inside the interval (0,co) must be a finite jump so that there are right- 
hand and left-hand limits at those discontinuous points. An exception is a 
discontinuity at « = 0 (if it exists); for instance, the function f(x) = 1/./z 
diverges at t = 0 but the integral (13.1) exists. 


3. The inverse Laplace transform of F(s) is a function f(a) such that 
L|f(x)] = F(s). Hence, the operation of taking an inverse Laplace trans- 
form is denoted by L~! and we have 


13.1 Basic Operations 409 


L™[F(s)] = f(z). 
This expression implies the possibility of dealing with the operators L 
and L~! algebraically, just as the equation ax = y can be rewritten as 
x = a'y. At thus point, it is not clear as to how the inverse operation 


L~' is to be performed, but actual manipulations are discussed in detail 
in Sect. 13.4.2. 


4. Not every function F'(s) has an inverse Laplace transform. A sufficient 
condition for F'(s) to have its inverse transform is presented in Sect. 13.4.2. 


13.1.3 Significance of Analytic Continuation 


Observe that the Laplace integral (13.1) involves a complex-valued term e~*” 


in its integrand, which makes it difficult to employ the standard methods 
of integration that are applicable to real integrands. One way to proceed 
would be to use the equation e~*” COS Wx — 1e sin wax, which yields 
two real integrands. This is, however, more complicated than necessary. An 
easier method is to make use of the following theorem, which is verified in 
Sect. 13.3.7: 


—Oxr —Ooxr 


=e 


@ Analytic property of Laplace transform: 

The Laplace transform F's), which is a complex-valued function of a 
complex variable s, is an analytic function in a region of Re (s) > o¢ 
with a specific real number o,. 


] Remark. Just at Re(s) = o-, however, no general conclusion can be drawn. 


This theorem states that once the value of F'(o) on the real axis is known, 
F(s) on an arbitrary point of the complex plane can be obtained by simply 
replacing o by s. This replacement is based on an analytic continuation 
from the semi-infinite line of the real axis, g > o-, to the right half of the 
s-plane, Re (s) > a¢, which is why we can perform the integration (13.1) as if 
s were a real variable. Several examples given later clearly show the efficacy 
of identifying s as a real parameter. 

At first glance, the formality of replacing o by s amounts simply to a 
change in symbol. But, without analytic continuation, we could no longer 
regard our replacement from o to s as a mere formality; i.e., the concept of 
analytic continuation lurks in the background. 


Remark. In particular, those cases in which F'(s) becomes multivalued cannot 
be treated without paying heed in detail to the difference between o and s. 
The latter issue regarding multivalued F'(s) is discussed in Sect. 13.2.5. 


410 13 Laplace Transformation 
13.1.4 Convergence of Laplace Integrals 


Emphasis is placed on the fact that the Laplace integral (13.1) may or may 
not exist depending on the value of s as well as the nature of f(x). A sufficient 
condition for the Laplace integral to converge is that the real component of 
s, Re(s), is greater than a specific value. This intuitively follows from the 
definition (13.1) that says if the integral (13.1) exists for 


S90 = 00 + 1Wo, 


then the integral also exists for every s such that Re (s) > go, since in the 
latter case 


ler | < ler aee| = e708. 


This is stated rigorously in the theorem below. 


@ Convergence of Laplace integrals: 
If the Laplace integral 


[ f(aje da (13.2) 
0 


converges for Re (s) = oo, then it also converges for Re (s) > oo. 


The proof is given in Sect. 13.3.4. This theorem implies the existence of a 
specific real number o, such that the integral (13.2) converges for Re (s) > a 
and diverges for Re (s) < o¢ (see Fig. 13.1). The number ga, is called the 
abscissa of convergence of the Laplace integral, whose value depends on 
the nature of the function f(x). With this notation, we say that the region 
of convergence of the Laplace integral is a half-plane to the right of 
Re (s) = a¢. This region of convergence is of course identified with the defining 
region of the Laplace transform F'(s). 


Remark. By definition, 0, may take —oo (or oo), which means that the integral 
(13.2) converges (or diverges) for all co. 


Examples Set f(x) = 1 for every « > 0. Then 


oo x 
Lif (x)] = i e “dz = lim s dx 
0 


13.1 Basic Operations 411 


Hence, we have 
1 
Li f(x)] = - for s>0. 
8 


For s < 0, the integral does not converge. This indicates that in this case 
Oo. = 0. 


Ims 


Fig. 13.1. The abscissa of convergence o, to the right of which the Laplace integral 
converges 


13.1.5 Abscissa of Absolute Convergence 


When the Laplace integral converges in the ordinary sense, it might converge 
absolutely in part or in all of its converging region. (Remember that the con- 
ditions for absolute convergence are more stringent than those for ordinary 
convergence). This leads us to define an abscissa of absolute convergence as 
follows: 


@ Abscissa of absolute convergence: 


Suppose that the Laplace integral (13.2) converges absolutely for 
Re(s) = 09 as 


[ Oe | che = ie |f(x)|e~ °°" dx < oo. (13.3) 
0 0 


The greatest lower bound oa, of such a go that satisfies (13.3) is called the 
abscissa of absolute convergence of the Laplace integral (13.2). 


Thus once og, is determined, we say that the integral (13.2) converges ab- 
solutely for ¢ > o,, does not converge absolutely for 0 < og, and may or 


412 13 Laplace Transformation 


may not converge absolutely at o = oq. Since absolute convergence implies 
ordinary convergence, it is clear that 


Oe SOq- 


The following example shows that o, does not generally coincide with o, (see 
Fig. 13.2). 


Ims 
r 


Fig. 13.2. The abscissa of convergence a, and the abscissa of absolute convergence 


Oa 


Example f(x) = e® sin e® 
Set u = e”; then we have 


co CO oy 
= : sin u 
F(s) =| e *e" sine” dx = / du. 
0 1 


us 


The integral converges absolutely for Re (s) = o > 1, converges conditionally 
for 0 <o <1, and diverges for og = 0. Hence, we have 


oc =0 and og = 1, 


which clearly indicates that in this case 0, 4 0. 


13.1.6 Laplace Transforms of Elementary Functions 


Let us evaluate the Laplace transforms F'(s) of several classes of elemen- 
tary functions. We treat the complex variable s as if it were real, bearing in 
mind that this formalism is based on the analyticity of F'(s), as discussed in 
Sect. 13.1.3. The defining region of each F'(s) is found on the right-hand side 
of the equation in question. 


1. f(x) =x”, where n is a positive integer. 


Integrating by parts, we have 


13.1 Basic Operations 413 
co 
F(s) = L[a"] =|} xe dx 
0 


a = : | =f iP 1g dz, (13.4) 
0 0 


s s 


Since s > 0 and n > 0, the first term in the last expression of (13.4) 
vanishes. Iteration of this process yields 
n(n —1)(n—2)---2-1 


nip. 0) _ 
L[z”] = a Lt] =o 


since L{t°] = L[{1] = 1/s. As a result, we have 


. f(x) = e%, where a is a real constant. 
oe 1 
F(s) = Lle**] = | e *edx = —— (a>a). 
0 Ss a 
. f(x) =sinaz, where a is a real constant. 
Integrating by parts twice, we obtain 


F(s) = Li[sinaz] = | e °*“ sinaxdx 
0 


eo st oo co 
= |- cos ar| += i: (—s)e-** cos axdx 
a 0 


6 a 
1 os erst. Bove POR Sn 
=--- sinax} +— e *” sin axdx 
a a a ; a Jo 
1 s? 
=-- SF), 
a a 


where we have used the fact that as s is positive, e~*” — 0 as x — ov, 


whereas sinax and cosaz are bounded as x — oo. Eventually, we set to 


F(s) = L{sin aa] = Pia (o > 0). 
In a simiar manner, we obtain 
oo 
L{cos aa] -f e *“ cosardx = eae (a > 0). 


414 13 Laplace Transformation 


4. f(x) = coshaz, where a is a real constant. 


Using the linearity property of the Laplace transform operator L, we ob- 


tain 
ett 4 eae 1 1 
L h =L = —Lle*| + —Lle—““ 
[cosh az] | 5 5 [e*"| 5 le **| 
1 1 
=3(s+a3)-2 5 (o> |al). 
Exercises 


1. Show the linearity of the Laplace transformation operator L. 


Solution: It follows from the definition of the operator D that 


pie eu | * Pee) + alae 


= a | e** f(ade +o f e °"g(a)da 
0 0 


= Lf (a)] + c2L[9(z)], 


where c; and cg are arbitrary constants. This clearly shows the linearity 
of the operator L. & 


2. Find the Laplace transform of the function, 


0, O<a<e, 
LSC. 


Solution: L[f(«x)] = | e °* f(a)dx = / e "dr =e “/s (a >0). 
0 c 
& 
3. Show that if f(a) is real and F(x) = L[f(«)] is single-valued, then F(s) 
is real. 
Solution: Set s = 0 > o¢ in the equation F(s) =) f(aje *"da. 


Then the integrand f(x)e~°” is real, so F(c) is real. This establishes 
that F'(s) is real on the real axis to the right of the point s = o,. In 
view of analytic continuation, therefore, F'(s) is a real-valued analytic 
function. & 


13.2 Properties of Laplace Transforms 415 


13.2 Properties of Laplace Transforms 


13.2.1 First Shifting Theorem 


In physical applications, we are sometimes required to calculate the Laplace 
transform of functions multiplied by exponential factors such as 


Brey Ce) 


where a is real or complex. This kind of problem can be simplified by applying 
the theorem below. 


@ The first shifting theorem: 


If F(s) = Li f(x)] for o > a, then 
F(s+a)=Lle *"f(a)| 


for 0 > 0, — Re (a), where a is real or complex. 


Proof Suppose o, to be the abscissa of convergence for F's). Then the integral 


j "em f(a)e**de: = I © layer O*de use) 
0 0 


clearly converges for Re (s + a) > o-. Observe that the integral on the right- 
hand side of (13.5) is an expression for F'(s + a). Thus we have the general 


result: 
L[e~* f(a)] = F(s +a), 


where F(s) = L[f(ax)]. & 


The above theorem states that if we know the Laplace transform of any func- 
tion, the transform of that function multiplied by an exponential can imme- 
diately be obtained by a simple shift (or translation) in the s variable. 


Examples 1. The first shifting theorem tells that 


n 


Lie“ a I= Gye 


o>—a, 


since 2 
L[x”"| = gril’ oS 0. 


416 13 Laplace Transformation 


2. Similarly, it follows from the first shifting theorem that 


gees b 
Lle Om sin bt] = (eta)? +e’ o>-—-a, 
where we use the fact that 
. a 
L{sin at] = at iaae 


13.2.2 Second Shifting Theorem 


For the next case, assume again that a function f(a) has a transform Fs) 
and consider a shift in the x variable from x to x — xo, where 2Q is a positive 
constant. Stipulating that the new function be zero for x < 2, it can be 
written 


f(x — x0)0(a — 20), (13.6) 
where 
= {Posy 


The Laplace transform of the shifted function (13.6) is thus represented by 
the integral 


i. f(a — x0)0(a — ag)e °* dx = ‘2 f(x —axo)e "dz. 
x9 
Now we change the variable of integration to t/ = « — 29, which gives us 
Li f(a — 20)0(a — x9)] =e °° ve f(t)e7*" da! = e~**° F(s). 

The result is stated formally below. 

@ The second shifting theorem: 

If F(s) = Li f(x)] for o > a, then 
e °F (s) = L[f(# — x0) 6(x — xo)] 


for o > o,., where 6(a) is a unit step function and T is a real and positive 
constant. 


Examples Consider the Laplace transform of the function 


0 (# <0), 
f(x)= 4 1/a (OS a<a), 
0 (a#>a). 


13.2 Properties of Laplace Transforms 417 


Using the step function, we express it as 


O(x) — O(a — a) . 


f(x) = 
Hence, it follows from the second shifting theorem that 


El] -e7* Ef] _ 1-e7* 


a as 


Lf (2)] = 


Note that in view of ’H6pital’s rule, when a — 0, L[f(x)] = 1. The latter 
result means that the Laplace transform of f(x) equals 1. 


13.2.3 Laplace Transform of Periodic Functions 


We now consider the Laplace transform of a periodic function f(a) of period 
A, ie., f(a + A) = f(x). Assuming that the f(x) is piecewise continuous, we 
have by definition 


risa))= f * em f(a)de 


0 
3X 


BN 2d 
- | ep a)de + f co" fla)de + f e  f(a)da+---. 


r 


On the right-hand side, let x = u+ A in the second integral, x = u+ 2A in 
the third integral, and so on. We then set 


X ; 
ure) = f cf a)ae + | e 84+) Fu + d)du 


Xr 
+f e Ut2) Fy + Dd)dute-:. 
0) 


From hypothesis, f(u+ A) = f(u), f(ut+ 2A) = f(u), etc. Replacing the 
dummy variable u by x yields 


r 
Lif (x)] = (1 4 e a e254 cave )f e °* f(ax)dx 


1 —SZx 
= =, e °* f(a)da. (13.7) 


~ J —e-5 
Once we introduce the function 


_Jf@),0<t<A, 
fol) = " otherwise, 


418 13 Laplace Transformation 


equation (13.7) becomes 


Fo(s) 
1— esr’ 


Lf(x)| = 
where ~ ; 
Fo(s) = | e ** fo(a)dx = | e °* f(a)dax 


So we have proven the following result: 


@ Laplace transform of a periodic function: 


If f(x) is a periodic function of period X, its Laplace transform is given 


by 
Fo(s) 
= gon" 


F(s) = (13.8) 


where 


ON 
Fy(s) = | e-* F(a) da 


Examples Consider the Laplace transform of the periodic square wave de- 
scribed by f(a + 2A) = f(x) with 


1 (0<a<X), 
fay ={ 3 (\<@ <2). 


From (13.8), we obtain 


1 2X 
F(s) = em e °* f(a)dx 


—- ( fk fy: oe 


se) fen (3). 


13.2.4 Laplace Transform of Derivatives and Integrals 


The Laplace transform of derivatives is a most important issue in terms of 
applications for solving differential equations. We shall see below that through 
the transform, certain kinds of differential equations are reduced to algebraic 
equations that are easy to manipulate. 


13.2 Properties of Laplace Transforms 419 


@ Laplace transform of derivatives: 


If F(s) = Li f(x)] for o > a, and if 


jim. eM @) =O tor o > Ge (13.9) 
then we have 
L[f'(z)] = sF(s) — f (0). (13.10) 


Proof Integration by parts yields 


Lipo = [ees ae 


0) 


- le ray. _ [carers ttwyae. (13.11) 


The second term on the right-hand side of (13.11) converges to sF(s) for 
ao > o-. In addition, the first term reads f(0) from the hypothesis of (13.9). 
Thus for ¢ > o,, we obtain (13.10). & 


This result can be extended to cases of higher derivatives. 


@ Laplace transform of higher derivatives: 


Suppose f(x) to be such that f~) (a) is continuous. If F(s) = L[f(2)] 
for 0 > o, and if 
jim ef) (x) = 0 
for k = 0,1,--- ,n—1 and o> a¢, then 


HONG) ate Elo) yes eh fe. (0): 
hall 


The above theorem is central to the use of the Laplace transform for solv- 
ing differential equations with specified initial conditions (i.e., initial value 
problems). 


420 13 Laplace Transformation 


o fe transform of integrals: 
If g(x Seal u)du, Li f(x)] = F(s) and if 


then 


Proof From hypothesis, we have g(0) = 0 and g’(s) = f(a), and thus 


13.2.5 Laplace Transforms Leading to Multivalued Functions 


Some care should be taken when the Laplace transform results in a multi- 
valued function. A typical example is the transform of the function 


f(z) =—= (x>0). (13.12) 


Although this function has a singularity at « = 0, the improper integral 


having a real integrand, 
[o-<) e7ot 
— dz, (13.13) 
I va 


converges for g > 0. In what follows, we first evaluate the integral (13.13) and 
then continue analytically with the result to arrive at a suitable region of 
the complex s-plane where we can get a precise form of F(s). 

The integral (13.13) can be readily evaluated by setting ox = u?; then it 


reads 5. 8s Ji 
2 Tv 

a e ” du= —=. (13.14) 
“al vo 


Now we would like to continue analytically to take the result (13.14) to the 
complex s-plane. At first glance, it suffices to replace \/o by \/s symbolically. 
However, this is not sufficient because the function \/s is double-valued (e.g., 
when s = i = e™/?, \/s may take the two distinct values: e™/* and e~37*/4; 


13.2 Properties of Laplace Transforms 421 


ae 


exp(zi/4) 


Ls 


0 
exp(—377i/4) 


Fig. 13.3. The double-valuedness of the function \/s 


(see Fig. 13.3). Thus we have two possible choices (i.e., two sheets of Riemann 
surfaces) when performing analytic continuation from the real single-valued 
function ,/o to the complex double-valued function \/s. We go into only one 
sheet of Riemann surface, the choice being the one on which the points of \/o 
are situated [i.e., the right half of the whole s-plane, expressed by Re (s) > 0]. 
With this convention, we arrive at the result 


F(s)=L =| = s (13.15) 


where the symbol ,/s implies the single-valued branch mentioned above. 


Remark. If the above case had been treated throughout with the variable s 
retained, the formal variable change would have led to the factor 1/,/s as in 
(13.15). However, we would not then have a clear meaning for /s; i-e., there 
would be no way to determine which branch is to be taken. 


Exercises 
1. Show that 
aim, f(2) = jim sF(s) 
and 


lim f(x) = lim sF(s), 


r— 00 


where the Laplace integral L[f(x)] = F(s) converges for 0 > 0. 
Solution: Take the limits s — oo on both sides of equation 
if. f'(a)e-**dx = sF(s) — f(0). (13.16) 
Then we have 0 = lim,_... sF(s) — f(0+), which gives us our first re- 


sult. Moreover, in the limit s — 0, the left-hand side of (13.16) reads 
i. f'(a)dx = limz oo f(x) — f(0+), so that we set our second result. & 


422 13 Laplace Transformation 


2. Find the transform of the function 


where k; > 1 and is an odd integer. 


Solution: This function gives convergence for g > 0. Integration by 
parts yields a general recurrence equation: 


[oe tk —oxr es k [o-e) 
‘ Vtke-°"dr = ps | + : Vtk-2e° "da. 
0 a , 20 Jo 


Since k > 1, the lower limit can be used in the first term on the right- 
hand side (and thus the integral exists). The result can be stated as 


i [ve] = £1 [vir] where k > 1 and odd. 


This yields a sequence of equations, starting with Vt—!, that is obtained 
from (13.15). Consequently, we have 


fel 20a Sy oh9)- 25 


L [val = (k +1)! fa . 
2k+1[(k + 1)/2]! Vist? 


In these general equations, the root of a power of s is always interpreted 
as being on the sheet of the Riemann surface on which the values of 


Vokt2 are found. & 


13.3 Convergence Theorems for Laplace Integrals 
13.3.1 Functions of Exponential Order 


The Laplace integral is improper by virtue of an infinite limit of integration, 
as shown clearly by 


foe) R 

| f(aje**dx = lim f(aje "da. (13.17) 
0 R00 JQ 

This improper integral can be identified with the Laplace transform F'(s) only 

when it converges for the values of s in question. Therefore, it is important to 

clarify the conditions under which the Laplace integral converges. As a first 

step in addressing this issue, we introduce a new class of functions: 


13.3 Convergence Theorems for Laplace Integrals 423 


@ Functions of exponential order: 


A function f(x) is said to be of exponential order ao if there is a real 
number ap such that 


lion ane" = for any a > ao, (13.18) 


TOO: 


and with the limit not existing when a < ap. 


See Fig. 13.4 for the decaying behavior of a function f(x) of exponential order 
a. Note that condition (13.18) is not necessarily satisfied at a = ag. The order 
number ap may take —oo if f(a) is identically zero beyond some finite value 
of x. 


Fig. 13.4. Decaying behavior of a function f(x) of exponential order a 


Examples 1. The function f(a) = x° is of exponential order zero. To see 
this, it suffices to check whether or not 


linn. (2 "22") (13.19) 


Moh ©.) 
exists. If a > 0, then VH6pital’s rule gives 
73 


lim —— = lim 
roo ECL zoo Q3 eet 


= 0. 


In contrast, when a < 0, (13.19) obviously diverges. Therefore x? is of 
exponential order zero. In a similar manner, it can be shown that x” for 
any integer n > 0 is of exponential order zero. 


2. The function f(x) = e® with any real constant c is of exponential order 
c, owing to the fact that 


424 13 Laplace Transformation 


lim ee °* =0 
roo 


if and only if a>. 


13.3.2 Convergence for Exponential-Order Cases 


Suppose f(x) to be of exponential order ap. Then, we can show that the 
Laplace integral 


ie f(aje "dx (13.20) 


converges absolutely whenever the real component of s is located within the 
range 
Re (s) =a > ao. (13.21) 


Since absolute convergence implies ordinary convergence, the inequality (13.21) 
serves as a sufficient condition for the Laplace integral (13.20) to converge. 
This result is formally stated by the theorem below. 


@ Theorem: (= A sufficient condition for convergence for exponential- 
order cases) 


If f(x) is of exponential order ap, then the Laplace integral 
Jo. f(a)e~**dax converges for 


Re (s) > ao. 


(See also Fig. 13.5.) 


Proof For any o in the range of (13.21), we can pick a number a; such that 
ag <a, <o. 
Since f(a) is of exponential order ap, we have 
lim f(xje"™* = 0. 
200 
This implies that for any given small ¢« > 0, we can find an appropriate X 


such that 
|f(a)le"M" <e for any x > X. 


Hence, given any small ¢ > 0, there exists an X such that for A, A’ > X, 


A’ Al 
[ lt@le cra = fo [stele ete eae 


A A 


A’ 
< | en (9-81) doy, (13.22) 
A 


13.3 Convergence Theorems for Laplace Integrals 425 


where the last integral in (13.22) converges to a finite value because o > a. 
This means that the leftmost integral in (13.22) can be made to approach 
zero by taking X sufficiently large. Thus in view of the Cauchy’s test for 
improper integrals given in Sect. 3.4.4, the inequality (13.22) establishes 
the absolute (and thus ordinary) convergence of the integral (13.20) in the 
region Re (s) > ao. & 


Ims 


Res 
0 a 


Fig. 13.5. Converging region of the Laplace integral of a function of exponential 
order ao 


Remark. The above theorem provides a sufficient condition for the ordi- 
nary convergence of the Laplace integral. Hence, a given Laplace integral 
of the function of exponential order a9 must converge for Re (s) > ag, 
whereas it may or may not converge at Re (s) < ao. For example, f(x) = 
cose” gives a9 = 0, but the corresponding Laplace integral converges for 
Re (s) > 1. 


13.3.3 Uniform Convergence for Exponential-Order Cases 


Next we examine the condition for uniform convergence. Here, uniform con- 
vergence means that the improper integral (13.20) as a function of s converges 
uniformly to F(s) over the whole defining region of the s-plane. To proceed, 
let ag be a number greater than ap and let o be in the range 


ag < ag <a. (13.23) 
For any choice of a2, we can find a number qa, such that 


ag < ay < Qo. 


426 13 Laplace Transformation 


The relation (13.22) is again valid by use of ag instead of a1, as expressed 


by 
A’ ii 
/ If(x)le "da < | oe (F-02)@ dap 


A A 
Furthermore, by introducing a,, we can extend this inequality to 


A’ A 
| | f(a)le-?"da < | oe (01-012) @ Jor 


A A 
Note that the last integral converges and is independent of o. Therefore, in 


view of the Weierstrass test for improper integrals (see Sect. 3.4.4), the 
Laplace integral f° f(a)e~**dx converges uniformly for Re (s) > a2 > ao. 
We have thus proved the following theorem: 


@ Theorem: (= A sufficient condition for uniform convergence for 
exponential-order cases) 

If f(x) is of exponential order ap, then the Laplace integral 
Jo f(a)e~**dax converges uniformly to F(s) = L[f(x)] for 


Re (s) > ag > ao. 


(See also Fig. 13.6.) 


Here, the constant a2 emphasizes that the converging region guaranteed by 
this theorem is closed at the lower end. 


Ims 


: >Res 
0 Qa a} 


Fig. 13.6. The region of uniform convergence associated with a function of 
exponential order ao 


13.3 Convergence Theorems for Laplace Integrals 427 


Remark. It is important to remember that the above theorem gives only a 
sufficient condition for convergence of the Laplace integral. In fact, it is pos- 
sible that some functions of exponential order allow their Laplace integrals 
to converge uniformly to the left of ao. 


13.3.4 Convergence for General Cases 


The previous two theorems tell a great deal about convergence of the Laplace 
integral for practical functions. On the other hand, for functions that are not 
of exponential order (but continuous within the integration interval), the 
following slightly different theorem applies. 


@ Theorem: (= A sufficient condition for convergence for general cases) 


If the improper integral 


i. i@e ce 


converges for s = 89, then it converges for Re (s) > Re (sg) (see also 
Fig. 13.7). 


Proof The proof requires an auxiliary function 


g(x) = fe f(r)e °° dr, (13.24) 


Res 


Fig. 13.7. Converging region of the Laplace integral that converges for s = so 


428 13 Laplace Transformation 


where f(a) is assumed to satisfy the conditions given above. Since f(z) is 
continuous, g(x) is also continuous and thus its derivative is given by 


g(x) = —fla)e~°°*. 


In terms of g(x), the Laplace integral can be written as 


(hea =sore—weg, —— f gl(x)e-* dx, (13.25 
jf 1@ rs [see e [ae 2, (13.25) 


where we have set w = s — So. 

We now examine sufficient conditions for the rightmost integral in (13.25) 
to converge. Cauchy’s test for improper integrals given in Sect. 3.4.4 
says that it converges if and only if for an arbitrary small ¢ > 0, we can find 


an X that yields 
A 
| g (aje "dr 


, 


ae (13.26) 


with A’, A > X. Therefore, our task is to show that the relation (13.26) holds 
for Re (s) > Re (so). 
Integration by parts gives us 


A A 
| g (x)e~’" da = —g(A!)e~¥" + g(A)e7 4 w| g(x)e—* dx, (13.27) 


, , 


which results in 


A 
| g (a)e "dx 


A 
< ofA)" + la(A)e*4 + [al f Jaleo ae. 
(13.28) 
From (13.24) and from the hypothesis given in the theorem, it follows that 


for an arbitrary small ¢’ > 0, there exists a number X such that 


lg(x)| <e’ when «> X. 
Thus, if A’, A > X we have 


I(A)L, Ig(A)] < &. 


In addition, if 
u = Re (w) > 0, 


then the relation (13.28) becomes 


A 
g (a)e "dx 


, W , 
< e! ae +4 eA +4 Jw fe _ 4) 
U 


Se! (2 + et ; (13.29) 


13.3 Convergence Theorems for Laplace Integrals 429 


Observe that the quantity in parentheses in (13.29) is finite for any fixed value 
of w with u > 0. Therefore, by making <’ small enough, the quantity 


e=e' (2 + 2 (13.30) 


becomes arbitrarily small; this can be the ¢ in the relation (13.26). Conse- 
quently, the relation (13.26) holds for any u > 0, or equivalently, for any 


u = Re (w) = Re (s) — Re (so) > 0. 


This completes the proof of the theorem. (Note that if u = 0, the quantity 
in parenthesies in (13.29) diverges, and if u < 0, the inequality (13.29) itself 
does not hold.) & 


Remark. The theorem is inconclusive for the convergence property on the 
line Re(s) = Re(so) depicted on the complex s-plane. Note that we do not 
get convergence when Re (s) = oo. This means that even though the integral 
converges at a point on the line of Re (s) = go, it does not necessarily converge 
all along the same line. A simple example is given by 


0, 0<a<1, 


f(z) = 


1 
—,@>1. 
x 


The Laplace integral 


[oe [oe —iWoxr lo) OO: 
Zs € COS Wot : sin Wot 
if f(a)e eae = | ac = | dx if dx 
0 1 x 1 x 1 x 


converges for 59 = 0+ iwo with wo # 0, but diverges at so = 0. 


13.3.5 Uniform Convergence for General Cases 


A sufficient condition for uniform convergence is obtained in a similar way 
as in Sect. 13.3.4, although it is not the same as that for ordinary conver- 
gence. The difference is due to the fact that in the proof above, ¢ defined by 
(13.30) is dependent on s through |w| = |s — so|. In order to get the range of 
uniform convergence, we need a certain infinitesimal factor that can be taken 
independently of s. 

To derive such a factor, let 6 be the angle of w = s — sg, and observe that 
u = Re(w) satisfies the relation 


1 
cos 8 


[ies 


) 


U 


430 13 Laplace Transformation 


when u > 0. If @ is restricted to the range 


|0| < (13.31) 


we can find an angle 6’ that satisfies 


laj<a<2 


or equivalently, 
|w| 1 1 


= < . 
U cos? — cos’ 


Inserting this into (13.30), we have 


where the quantity ¢” is independent of s and becomes arbitrarily small by 
making ¢’ small enough. This is true as far as condition (13.31) is satis- 
fied; in this context, (13.31) represents the region of uniform convergence 
of the Laplace integral. Rewriting 6 by arg(s— so), we arrive at the following 
theorem: 


@ Theorem (= A sufficient condition for uniform convergence of the 
Laplace integrals for general cases): 
If the improper integral 


i i@e de 

0 

converges for s = so, then it converges uniformly to F(s) = L[f(x)]| for 
larg (s — s0)| <6" < 5. 

(See also Fig. 13.8.) 


Here, the 0’ shows the closedness of the converging region. The 6’ can be 
arbitrarily close to but not equal 7/2. 


13.3 Convergence Theorems for Laplace Integrals 431 


Ims 


Res 


Fig. 13.8. The region of uniform convergence for the Laplace integral that converges 
for s = so 


13.3.6 Distinction Between Exponential-Order Cases 
and General Cases 


We have thus far presented four convergence theorems in connection with 
Laplace integrals, where the former two are associated with functions of ex- 
ponential order and the latter two are relevant to are more general functions. 
The theorems for the two cases are similar to the extent that they all identify 
a half-plane of convergence for the Laplace integral. Moreover, the general 
cases that we have considered cover a wide class of functions that includes 
exponential-order functions as a special case. At first glance, these remarks 
appear to imply that each of the former two theorems for exponential-order 
cases is a special case of each of the latter for general cases, but, this is not 
true at all. Below we give the reasons for this not being so. 

First, the theorem for ordinary convergence in the exponential-order case 
is intrinsically different from that in the general case. Observe that the for- 
mer theorem not only tells us that the Laplace integral converges in a half- 
plane; it also gives a specific number (i.e., ao) for the abscissa of a left-hand 
boundary of such a half-plane. (Of course o. < apo since it gives a suffi- 
cient condition for convergence.) In contrast, the latter theorem merely states 
convergence to the right of any point at which we already know that the in- 
tegral converges; it gives no information about a boundary of the region of 
convergence. 

Second, the regions of uniform convergence are specified in a different 
manner for the two cases. Whereas the theorem for general cases tells us 
only that the Laplace integral converges uniformly in an angular sector of the 
right half-plane, the theorem for exponential-order cases indicates uniform 
convergence in a less restricted region, namely, a half-plane. 

In short, the theorems for the two cases are essentially different. As well, 
it should be emphasized again that all the four theorems provide sufficient 


432 13 Laplace Transformation 


conditions for convergence of the Laplace integrals—not necessary or necessary 
and sufficient conditions. 


13.3.7 Analytic Property of Laplace Transforms 


An important consequence of uniform convergence of the Laplace integral is 
the fact that the corresponding Laplace transform, 


F(s) = i f(aje *"da, (13.32) 


is an analytic function on the complex s-plane. We know that if F(s) is 
analytic, it will exist outside the range of convergence of its integral represen- 
tation, which can be uniquely determined by analytic continuation. From a 
practical viewpoint, the analyticity of F'(s) plays a crucial role in evaluating 
the Laplace transform of a given function, since we can use it to treat the 
complex variable s as if it were real (see Sect. 13.1.3). We close this section 
by proving the analyticity of F'(s). 


@ Theorem: 
The Laplace transform F(s) is analytic in the region of uniform conver- 
gence of the corresponding Laplace integral (13.32). 


Proof We first recall that for F'(s), there is a region of uniform convergence 
in the s-plane and then we perform a contour integration with respect to s 
over an arbitrary simple closed path C' in this region. Owing to the uniform 
convergence property, the order of integration may be inverted so that we 


ial { rious [* te (f.omes) anno 


which gives us zero because Cauchy’s integral formula means that 


¢ e **ds=0. 
C 


Since the path of C is arbitrary in the region of uniform convergence, Mor- 
era’s theorem establishes that F's) is analytic inside the region of uniform 
convergence of its corresponding Laplace integral. d& 


13.4 Inverse Laplace Transform 


13.4.1 The Two-Sided Laplace Transform 


This section describes the inverse Laplace transformation. Intuitively un- 
derstood, the inverse Laplace transform L~'{F(s)] of a function F(s) is a 


13.4 Inverse Laplace Transform 433 


function f(a) whose Laplace transform is f(s). Nevertheless, actual oper- 
ations represented by the operator L~! take some time to develop. To set 
to the explicit formula for manipulating the inverse transformation, we first 
introduce another kind of Laplace transform: 


@ Two-sided Laplace transform: 
If the improper integral 


ih : flaje" "dx (13.33) 


exists, it is called the two-sided Laplace transform (or bilateral 
Laplace transform), designated by F(s) = L[f(a)]. 


It is easy to determine the region of convergence of such an integral. Observe 
that 


0 lee) 
Lif (s)] = flayeae+ f f(a)e°* da. (13.34) 


The second integral is an ordinary Laplace integral so that it converges on a 
half-plane right to a fixed point denoted by « = o,1. By the change of variable 
x = —u the first integral becomes 


if . fede = | ” f(—we du. 


Here, the latter integral is also an ordinary Laplace integral, although s has 
been replaced by —s. Hence, its region of convergence is a half-plane left to 
a point, say x = 0,2. As a result, the common part of the two half-planes, 
Jc < Re(s) < o¢2, forms the region of convergence of the integral (13.34) as 
depicted in Fig. 13.9. 


with a finite interval, but may be a right half plane, a left half-plane, the 


Remark. Typically, the range of convergence of (13.34) forms a vertical strip 
whole s-plane, a single point, or fail to exist. 


Example We show that the function 1/(s?+s) can be expressed as a two-sided 
Laplace integral. We readily see that 


1 1 1 


s(s+l1) os s+l1 


434 13 Laplace Transformation 


Ims 
" 


even 0 Oc2 


Fig. 13.9. Overlapping region: the region of convergence of the two-sided Laplace 
integral 


We know that = 
— =} e*e “dx for o>-l (13.35) 
0 


1 foe) 0 
= =| e °"dx =| e“dx for a>0 
8 0 —o0o 


the latter and that can be rewritten as 


and 


1 0 
= -/ e °**(-1)dx for o <0. (13.36) 


From (13.35) and (13.36), we obtain 


1 1 1 aa 
s(s+1) 5s s+41 I. F(ae = 


where 
-e*, O<a2<o, 
—l, -3o<2<0, 


which means that 1/(s? +s) = L[f(x)]. The interval of convergence is seen to 
be —1 <o <0. example 


13.4.2 Inverse of the Two-Sided Laplace Transform 


Having introduced the two-sided Laplace transform, we are ready to undertake 
the inverse Laplce transformation. We first observe that the two-sided Laplace 
transform 


F(s)= a f(a)e*" dx (13.37) 


13.4 Inverse Laplace Transform 435 


is identical with the Fourier transform 
oo * 
Flo +iw) = / f(aje "eda 
—cCO 


if we regard the real number a as fixed. We use the inverse Fourier trans- 
formation to yield 


1 °° 
fBe<'= | F(a + iw)e’’* dw, 
QT foes 
or equivalently, 
sey Ee 
f(z) = | F(a + iw)e™e* dw. (13.38) 


We then replace o + iw by s, keeping in mind that s should lie on the ver- 
tical line with the abscissa Re(s) = o. Then the integral (13.38) can be 
regarded as a contour integral along the vertical line Re(s) = o. On this 
contour, 


ds = idw, 
so the integral (13.38) becomes 
1 ao+too 
f(z) = af F(s)e**ds (Re(s) =a is fixed). (13.39) 
Mt Sa—ioo 


This result provides a clue for evaluating the explicit form of f(x) from its 
two-sided Laplace transform F(s). 

The result (13.39) is not yet satisfying. We should recall that f(a) is 
not determined uniquely by F(s) through (13.39) unless the location of the 
point x = is specified (see Exercise 3 in Sect. 13.4). If we know in advance 
that o lies in the region of convergence of the two-sided integral given by 
(13.37), ie., the strip of convergence, f(a) is uniquely determined by (13.39). 
However, if o used in (13.39) is set outside this strip, the integral of (13.39) 
is altered quantitatively because the integration contour passes over one or 
more singular points of F(s). Thus for us to be able to use equation (13.39), 
we must know the region of convergence of the Laplace integral of f(a) before 
we can fix the real number o. If only F(s) is given, we will not be able to 
locate this region, and not be able to obtain f(a) because we will not know 
where to put o. These caveats lead to the following theorem: 


@ Theorem: 
The inverse of the two-sided Laplace transform 


436 13 Laplace Transformation 


a+ioo 
Oe / Fine. as (hey ae ered) 


~ Oni 


=100 


determines f(x) uniquely only if we know where o should be located. 


13.4.3 Inverse of the One-Sided Laplace Transform 


Let us develop the theory that correspond to the above for the one-sided 
transform. We compare the two-sided transform L[f(x)] and its one-sided 
counterpart L[f()], where f(x) is the same function in both cases and is 
defined for all x. From the definitions of the one- and two-sided transforms, 
it is evident that 

P(s) = LUf(@)] = LUf(@)6(a)], 


where 0(x) is the step function. This implies that 


aot+ioco 
f(x)0(x) = : / F(s)e’*ds (o is fixed). (13.40) 


27 Sg —i00 


Here, o must be to the right of all the singularities of Fs) in order for the 


integral in (13.40) to converge. As a consequence, we have arrived at the 
following theorem: 


@ The inverse Laplace transformation: 
If the function F's) defined by 


Fs)= | * 6 ® F(0) dn 


is analytic for Re(s) > o<, then f(x) for x > 0 is uniquely determined by 


f(a) = jim. st f aac, 


W—-Co Priai o—iw 


where o is arbitrary for all o > o¢. 


13.4.4 Useful Formula for Inverse Laplace Transformation 


In contrast to the situation with the inverse Fourier transformation, the use of 
the inverse Laplace transformation formula is less convenient. This is primarily 
because the calculation of the complex integral 


13.4 Inverse Laplace Transform 437 


at+ioo 
i e*” F(s)ds 


—tco 


can be rather complicated. In this subsection, we present a simple and natural 
method of computing integrals of this form that is based on the residue 
theorem. 
Suppose that F(s) is analytic on the domain Re(s) > o-. We wish to 
compute 
1 ot+iw 
f(x) = lim al ee" F(s)ds, «>0. 


W—-Co Pra) o—iw 


No general method for doing this exists, but it is possible to evaluate this 
integral under certain conditions on F'(s). Suppose that F'(s) is analytic on the 
entire complex plane, except at a finite number of singularities s),82,--- , Sp 
satisfying 

Re(sj;) <a, jg=1,2,---,n. 


Figure 13.10 is a sketch for this situation. Let 0 > o, and let R > 0 bea 
real number sufficiently large that the left half-circle Cr with center s = a 
and radius R encloses all the points s1,59,--- ,s,. Devide Cr into the two 
segments: 


IpR={sEC: s=o+iw,-R<w< R}, 


Tr={se€C: |s—o|=R,Re(s) < o}. 


Fig. 13.10. A finite number of singularities s; of F'(s) enclosed by the left half-circle 
Cr composed of Ir and Ir 


438 13 Laplace Transformation 
By the residue theorem, 
n 
¢ e** F(s)ds = S Res [e*” F'(s); 3;]. 
Cr fal 


The right-hand side is independent of R, if R is sufficiently large. From Cr = 
I'pRUTp, it follows that 


¢ eo F(sjds= | oF (s)ds+ | e** F(s)ds. 
Cr TR Ir 


ot+iM 
lim ee’ F(s)ds = lim e** F(s)ds. 


Moo 6=iM R-0o Ir 


Clearly, 


Therefore if, by chance, we have 


lim e*” F(s)ds = 0, (13.41) 


R-0o Pr 


then we obtain 


1 otiM n 
f(z) = lim oF i e*’ F(s)ds = 2 Res [e*” F(s); 85] . 


Unfortunately, condition (13.41) does not hold for every F’. The next theorem 
presents a sufficient condition on F' under which (13.41) holds true. 


@ Theorem: 
Let F' be an analytic function on the complex plane except at a finite 
number of points (if they exist) and let Ip be as above. If 


lim max |F(s)| =0, 
R—oo s€IR 


then 
lim e*” F(s)ds = 0 


R-oo igen 


holds for every x > 0. 


Proof This theorem is a reinterpretation of Jordan’s lemma given in 
Sect. 9.2.4. & 


An immediate consequence of this theorem is the following: 


13.4 Inverse Laplace Transform 439 


@ Theorem: 
Let F' be an analytic function on the complex plane except at a finite 


number of points s1, $2,--- , Sn, satisfying Re(s;) < o for all j. If 
li F —1() 13.42 
fe See Toy NU. (13.42) 


then the inverse Laplace transform of F'(s) is given by 


j(@) = ors [e*" F(s); 35]. (13.43) 


13.4.5 Evaluating Inverse Transformations 


Below are several examples of actual evaluations of inverse Laplace transforms 
via the residue formula (13.43). 


Example 1. Assume a complex-valued function 


1 
sg? —3s+2 


that has two simple poles s; = 1 and sz = 2 We thus choose o = 3 and set 


Cr={s: |s—3] = R,Re(s) < 3} 


F(s) = 


in order to make use of equation (13.43). Before doing so, we must check that 
condition (13.42) is satisfied. Observe that 


max |F'(s)| = max 
seECR seCR 


1 
(s—1)(s—2)]|- 
If we let R = |s — 3] go to infinity, then |s — 1| and |s — 2| will also converge 
to infinity, so that 
1 

li —————_ =0. 

R-vco 8€Cr |(8 — 1)(8 — 2)| 
Thus (13.43) provides the desired result: 


Sx 


f(2) = Res | 7; s=1] + Res | es =2 


440 13 Laplace Transformation 


Remark. The above example can be solved more easily by rewriting F’ using 


partial fractions F(s) = 1/(s— 2) —1/(s—1), followed by applying known 
equations to get 


Example 2. It should be cautioned that equation (13.43) is valid only when 
the condition (13.42) is satisfied. As a negative example, let us consider the 


step function 
1 zr>c 
0 =_ ’ » 
(x) { 0, &<e, 


—cs 


F(s) = L[@(x)| = (s > 0), 


Ss 


We would like to derive 6(2) from Fs) through the inverse transformation 
given by 
1 SX ,—CS 


g+iM , e 
f(x) = L*[F(s)| = lim =| 7 ds (0<a¥c). 


- : 
M-0o 271 o—iM 


However, we cannot use (13.43) to obtain f(a), since the function e~“/s does 
not satisfy condition (13.42). In fact, if we set s = a — R, then 


—cs eche-co 


> 
= 75-k oo (R- ov) 


max 
seECR 


Ss 


since c > 0. 


Remark. If we were to use (13.43) in Example 2, we would obtain a wrong 
result. The function e~°*/s has a single simple pole at s = 0, so 


SX 5—CS 
e€ 


e€ 


Res| -e=0|=1 


for each value of a. This is, of course, not the step function 0(2). 


Example 3. Next we consider the inverse Laplace transformation L~![F(s)] of 
the function i 
F(s) = (a > 0). 


st+a 


13.4 Inverse Laplace Transform 441 


The Fs) has a first-order pole at s = —a. The residue of F(s)e*” at s = —a 
reads 


Res [F'(s)e*”, —a] = lim (s+ a)F(s)e** = lim e* =e"™. 


saa s——a 


Hence, we have 


13.4.6 Inverse Transform of Multivalued Functions 


Some caution must be taken when considering the inverse Laplace transform of 
multivalued functions. As an example, we consider a multivalued function 
F(s) = 1/,/s, and examine its inverse transform given by 


1 eo” 
fax mai | (13.44) 


where the symbol ,/s represents values of the original double-valued function 
s'/? in the same sheet of the Riemann surface. The function 1/,\/s has a 
branch point at s = 0, so among many choices we set its branch cut at 
(—o0, OJ. 

Since the function 1/,/s approaches zero as |s| — oo, Jordan’s lemma 
is applicable. Nevertheless, the problem becomes rather complicated owing to 
the presence of the branch cut. To perform the integration of (13.44), we close 
the path I’ by a circle to the left, bypassing the branch cut in the manner 
shown in Fig. 13.11. No singularities are enclosed by the closed curve consising 
of C'+P+7+C", in which C’ is the vertical line, C” is the pair of parallel 


Fig. 13.11. Closed loop employed in evaluating the integral (13.44) 


442 13 Laplace Transformation 


horizontal segments, y is the small circle of radius 6, and I’ is a semicircle 
from which the infinitesimal gap at the branch cut has been omitted. Hence, 
we have 


(Sael Gas ae) ant 


In the limit R — oo, the integral over I vanishes and the path C’ reduces to 
C as given in (13.44), which implies that 


eo” 


1 e —1 
= lim — | —~ds= lim — —ds. 13.4 
Fle) = im. of Fae im. = pi ee oo) 


Thus our remaineing task is to evaluate the last term in (13.45). 
Recall that \/s is double-valued so that it is discontinuous across the 
branch cut. Consequently, on the parallel segments C”, 


Vs=i/p and —i/p with p= |s|, 
above and below the branch cut, respectively. On the small circle y, 
Ja = Vie'#"?, 


where —7 < ¢ < a, and ds = —dp on each of the straight lines. Thus, we have 


[Saf Fay si [an 
s=—t —dp +if —dp 
onty V8 R-o, VP 5 ve 


od —T al Aaa 
sya IP e 


Let 6 go to zero and R approach infinity; then, the first two integrals on 
the right-hand side combine into a single integral. The last integral on the 
right-hand side approaches zero. As a result, we have 


lim li 
a0 Re Cl hy ails =-2 [P< aa 


1 [ enke 
T Jo JP 


By substituting px = u?, the right-hand side becomes 


which implies that 


Ly ee 58 i. Pant! 
p= (7 uUu= —=—. 
aJo /P TV Jo Jara 


13.4 Inverse Laplace Transform 443 


Eventually, we obtain 


which is consistent with the earlier result presented in (13.15). 


Exercises 


1. Find (a) L~'[5/(p + 2)] and (b) L~1[1/p*] where s > 0. 


Solution: (a) Recall that L{e*”] = 1/(p — a); hence L~'[1/(p — 
a)| = e*”. It follows that 


5 1 
EG ss =5L7! | = he 2”, 
p 


(b) Recall that 


so 
1 


ue = 
| I(k +1) 
If we now let k+1=s, then 


2. Solve the differential equation 
f(a) + f(a) =1 (13.46) 


with the initial conditions 


using the Laplace transformation. 


444 13 Laplace Transformation 


Solution: Taking the Laplace transform of both sides of (13.46), 


we obtain 
Lif" (@)| + ELf(@)| = Lf. (13.47) 


Substituting the result 
Lif" (2)| = s°L[f(z)|—s- f(0)— f/(0) =s°L[f(2)] 
and L[1] = 1/s into (13.47) yields 


?E[f(2)] + LUf(@)] = 


Thus we see that 


f(x) 


L“F(s)| = 7} | Lo} = 


1—cosx for x>0, 
0 for x <0, 


I 


which is the solution of the initial value problem originally given 
by (13.47). & 


3. Derive the two-sided Laplace transform of the following three functions: 


e 2% x > 0, 
ee”, £<0, 


e-2% —e-® x > 0, 


fate) = { 6, e230 hte) = 


0, x>0, 


fol) = ee —e*,2<0. 


Solution: The two-sided Laplace transform read, respectively, 


1 1 

Ll fa(x)] = ee aay for o > —1, 
1 1 

Ll fo(x)] = a for 2<oa<-l, 
1 1 

Li f-(x)] = oe eer for o < —2. 


Clearly, all the s functions are the same and may be labeled F(s) 
(although the region of convergence is different). This implies that 
the inverse of a two-sided transform is uniquely determined only 
after the location of o is fixed. & 


13.5 Applications in Physics and Engineering 445 
13.5 Applications in Physics and Engineering 
13.5.1 Electric Circuits I 
The most familiar applications of Laplace transformations in the physical sci- 
ences are encountered in analyses of electric circuits. Consider the RC circuit 


depicted in Fig. 13.12. The electric charge q(t) deposited in the condenser 
with capacitance C' is governed by the equation 


nt + a“ = v(t), g(t =0) =0, (13.48) 


where R is a resistance and v(t) is the external voltage. We set a rectangular 
voltage defined by (see Fig. 13.13) 


u(t) = vp x [6(t—a) —A(t—b)] (a <b) (13.49) 


with the step function 


Fig. 13.13. The time dependence of a rectangular voltage applied to the RC circuit 


446 13 Laplace Transformation 


We now want to solve the differential equation (13.48) with respect to q(t). 
To do this, we apply the Laplace transform to both sides of (13.48) and make 
use of the symbol Q(s) = L[q(t)]. Straightforward calculation yields 


sa) + (EY, 


a R 8 Ss 


where 7 = RC is called a damping time constant. Hence, we have 


Rs+t S Ss 
1 1 
=C ees —as —bs 
on € s+ —) ( < ) 
—as —bs —as —bs 
ey e e € € 
8 8 st+7-1 g4+r771 


= Cup [ace a) — 0(t — b) — e~ © 9)/"6(t — a) +e )/ (4 — »)| 
0 t<a, 

= < CU [1 - eet] a<t<b, 
cup [e8/7 — eT] e/7  t >b. 


The explicit time-dependence of the charge q(t) given by (13.50) is illustrated 
in Fig. 13.14, in which various separations b — a are taken. 


0.87 


electric charge q(t) 


time t 


Fig. 13.14. Time dependence of the electric charge q(t) described by (13.50), which 
is accumulated in the condenser in the RC circuit. The parameter a introduced in 
(13.49) is fixed at a = 1.0 


13.5 Applications in Physics and Engineering 447 
13.5.2 Electric Circuits II 


Next, in order to illustrate the use of convolution integrals in applications 
of Laplace transforms, we solve the previous equation (13.48) with respect to 
the current i(¢) instead of charge q(t). We consider the differential equation 


Ore [aww TON (13.50) 


with the rectangular voltage (13.49). The integral term on the left-hand side 
in (13.50) is rewritten as a convolution integral: 


i; i(u)du = | i(u)O(t — u)du = A(t) * i(t), (13.51) 
0 0 
whose Laplace transform reads 

LO (t) * a(t)] = L[A(é)] - Lle(t)] = “I(s) 


I(s) VO = =p 
RI = as 5 13.52 
Oe eae (e es (13.52) 
which implies 
—as bs 
Ug € = € 
(s) area (7 = RC) (13.53) 


current i(t) 


1 L n n f 
(e) 1 2 3 4 


time t 


Fig. 13.15. Time dependence of the current i(t) in the RC circuit described by 
(13.54). The parameter a introduced in (13.49) is fixed at a = 1.0 


448 13 Laplace Transformation 
Using the inverse transformaion, we finally set 


i(t) = L~* [I(s)] 


I 


V0 | .-(t-@)/T 974 — g) — e-(t-8)/T 94 — 
= |e 6(t—a)—e 6(t — b) 


0 t<a, 


Vo a/t ,—t/t 
—e"'7e a<t<ob, 
R = 


I 


7 (e*/ = e/") e t/t t>b. 


Figure 13.15 illustrates the time dependence of the current i(t). 


14 


Wavelet Transformation 


Abstract Similar to the Fourier and Laplace transforms, a wavelet transform is an 
integral transform of a function by using “wavelets.” A wavelet is a mathematical 
mold with a finite-length and fast-decaying oscillating waveform, which is used to 
divide a given function into different scale components. Wavelet transforms have 
certain advantages over conventional Fourier transforms, as they can reveal the 
nature of a function in the time and frequency domains simultaneously. 


14.1 Continuous Wavelet Analyses 


14.1.1 Definition of Wavelet 


This short chapter covers the minimum ground for understanding wavelet 
analysis. The concept of wavelet originates from the study of signal analysis, 
i.e., from the need in certain cases to analyze a signal in the time and frequency 
domains simultaneously. The crucial advantage of wavelet analyses is that 
they allow us to decompose complicated information contained in a signal 
into elementary functions associated with different time scales and different 
frequencies and to reconstruct it with high precision and efficiency. In the 
following discussions, we first determine what constitutes a wavelet and then 
describe how it is used in the transformation of a signal. 
The primary question concerns the definition of a wavelet: 


@ Wavelet: 
A wavelet is a real-valued function 7(t) having a localized waveform 
that satisfies the following criteria: 


1. The integral of 4(t) is zero: / w(t)dt = 0. 


2. The square of 7(t) integrates to unity: i Cy Ghe—all, 


450 14 Wavelet Transformation 


3. The Fourier transform Y(w) of a(t) satisfies the admissibility condi- 
tion expressed by 


o | 2 
Cy = i EACH) 68: (14.1) 
0 WwW 
Here, Cy is called the admissibility constant, whose value depends 
on the chosen wavelet. 


We restrict our attention to real-valued wavelets, although it is possible to 
define complex-valued wavelets as well. Observe that condition 2 above says 
that w(t) has to deviate from zero at finite intervals of t. On the other hand, 
condition 1 tells us that any deviation above zero must be canceled out by a 
deviation below zero. Hence, (t) must oscillate across the t-axis like a wave. 
The following are the most important two examples of wavelets: 


Examples 1. The Haar wavelet (See Fig. 14.1a): 


1 
ria -1<t<0O, 
v(t) = Ret cers 1 (14.2) 
0, otherwise. 


2. The Mexican hat wavelet (see Fig. 14.1b): 

2 (1 2 ©) e-??/(20?) 
the 
w( ) V3071/4 


To form the Mexican hat wavelet (14.3), we start with the Gaussian func- 
tion with mean zero and variance o?: 


(14.3) 


(a) 1.0, (b) 1.07 
er 0.5} 
= 00 — 2, 2 00y 
~0.5} ~0.5} 
1.0 ! ! ! 1.0 ! ! ! ! 


Fig. 14.1. (a) The Haar wavelet given by (14.2). (b) The Mexican hat wavelet 
given by (14.3) with o =1 


14.1 Continuous Wavelet Analyses 451 
et? / (207) 

Vino ~ 
If we take the negative of the second derivative of f(t) with normalization 
for satisfying condition 2, we obtain the Mexican hat wavelet (14.3). In the 


meantime, we proceed with our argument on the basis of that wavelet by 
setting o = 1 and omitting the normalization constant for simplicity. 


fOH= 


Remark. We know that all the derivatives of the Gaussian function may be 
used as wavelets. The most appropriate one many particular case depends on 
the application. 


14.1.2 The Wavelet Transform 


In mathematical terminology, the wavelet transform is known as a con- 
volution; more precisely, it is a convolution of the wavelet function with a 
signal to be analyzed. In the convolution process, two parameters are involved 
that manipulate the function form of the wavelet. The first is the dilatation 
parameter denoted by a, which characterizes the dilation and contraction of 
the wavelet in the time domain (see Fig. 14.2a). For the Mexican hat wavelet, 
it is the distance between the center of the wavelet and its crossing of the time 
axis. The second is the translation parameter b, which governs the move- 
ment of the wavelet along the time axis (see Fig. 14.2b). With this notation, 
shifted and dilated versions of a Mexican hat wavelet are expressed by 


(a) 4 


Fig. 14.2. Translation (a) and dilatation of a wavelet (b) 


452 14 Wavelet Transformation 


2 
C= f-(esyJomen 


where we have set ¢ = 1 in (14.3) and omitted the normalization factor for 
simplicity. We are now in a position to define the wavelet transform. 


@ Wavelet transform: 
The wavelet transform T(a, b) of a continuous signal «(t) with respect 
to the wavelet (t) is defined by 


Co 


7(a,b) = w(a) f 


es) 


v(t)e (=) at (14.5) 


a 


where w(a) is an appropriate weight function. 


Typically, w(a) is set to 1/,/a because this choice yields 


OR kre pale ee 5 ; bab 
fll ; )| a= fw du=1 with w= a 


i.e., the normalization condition for the square integral of w(t) remains invari- 
ant, which is why we use this value for the rest of this section. 
The dilated and shifted wavelet is often written more compactly as 


Va,p(t) = = (¢ = *) ; 


so that the transform integral may be written as 


T(a,b) = , oom (14.6) 


—oo 


From here on, we use this notation and refer to Wa,,(t) simply as the wavelet. 


14.1.3 Correlation Between Wavelet and Signal 


Having defined the wavelet and its transform, we are ready to see how the 
transform is used as a signal analysis tool. In plain words, the wavelet trans- 
form works as a mathematical microscope, where 6 is the location on the time 
series being viewed and a represents the magnification at location b. 

Let us look at a simple example evaluating the wavelet transform T(a, b). 
Figures 14.3 and 14.4 show the same sinusoidal waves together with Mexican 
hat wavelets of various locations and dilations. In Fig. 14.3a, the wavelet is 
located on a segment of the signal on which a positive part of the signal 
is fairly coincidental with that of the wavelet. This results in a large positive 


14.1 Continuous Wavelet Analyses 453 


(oe by 
Y 
14 
ify 4 
Ss 0 
Il 
& be 
eee | 
i en 
-10 =5 0 5 10 
b 


Fig. 14.3. (a), (b) Positional relations between the wavelet (thick) and signal 
(thin). The wavelet in (a) located at b; = 7/2 is in phase with the signal, which 
results in a large positive value of T(a,b) at b1. The wavelet in (b) located at 
bz = —7/2 is out of phase with the signal, which yields a large negative value of 
T(b) at be. (c) The plot of T(a = 1.0,b) as a function of b 


value of T(a, b) in (14.6). In Fig. 14.3b, the wavelet is moved to a new location 
where the wavelet and the signal are out of phase. In this case, the convolution 
expressed by (14.6) produces a large negative value of T(a, b). In between these 
two extrema, the value of T(a, b) decreases from a maximum to a minimum as 
shown in Fig. 14.3. The three figures thus clearly demonstrate how the wavelet 
transform T(a,b) depends on the translation parameter b of the wavelet of 
interest. 


454 14 Wavelet Transformation 


3 
2 
1 
0 


=1/2) 


T(a, b: 


Fig. 14.4. Wavelets with a = 0.33 (a) and a = 4.0 (b), in which b = 7/2 is fixed. 
The resulting wavelet transform T(a,b = 7/2) as a function of a is given in (c) 


In a similar way, Fig. 14.4 a-c shows the dependence of T(a,b) on the 
dilatation parameter a. When a is quite small, the positive and negative parts 
of the wavelet are all convolved by roughly the same part of the signal «(t), 
producing a value of T(a, b) near zero (see Fig. 14.4a). Likewise, T(a, b) tends 
to zero as a becomes very large (see Fig. 14.4b), since the wavelet covers many 
positive and negatively repeating parts of the signal. These latter two results 
indicate that when the dilatation parameter a is either very small or very 
large compared with the period of the signal, the wavelet transform T’(a, b) 
gives near-zero values. 

Figure 14.5 shows a contour plot of T(a,b) vs. a and 6 for a sinusoidal 
signal 

x(t) =sint, 


14.1 Continuous Wavelet Analyses 455 


where the Mexican hat wavelet has been used. The light and shadowed regions 
indicate positive and negative magnitudes of T(a,b), respectively. The near- 
zero values of T(a,b) are evident in the plot at both large and small values 
of a. In addition, at intermediate values of a, we observe large undulations 
in T(a,b) corresponding to the sinusoidal form of the signal. This wavelike 
behavior is accounted for by referring back to Figs. 14.3a-b and 14.4a-b, 
where wavelets move in and out of phase with the signal. 

Therefore, when the wavelet matches the shape of the signal well at a 
specific scale and location, the transform value is high. On the other hand, if 
the wavelet and the signal do not correlated well, the transform value is low. 
Carrying out the process at various signal locations and for various wavelet 
scales, we can determine the correlation between the wavelet and the signal. 


Remark. In Fig. 14.5, the maxima and minima of the transform occur at an 
a scale of one quarter of the period, 7/2, of the sine wave x(t) = sint. This 
feature holds in general; correlation between the wavelet w,,,(t) and the signal 
x(t) with a period p becomes a maximum at a = p/4. 


Fig. 14.5. Contour plot of the wavelet transform T’(a, b) of a sinusoidal wave x(t) = 
sint 


14.1.4 Actual Application of the Wavelet Transform 


The wavelet transformation procedure can be applied to signals that have a 
more complicated wave form than a simple sinusoidal wave. Figure 14.6 shows 
a signal 

x(t) = sint + sin 3t 


456 14 Wavelet Transformation 


1 2 3 4 5 6 
a 


Fig. 14.6. Wavelet transform T(a,b) of a complicated signal «(t) = sint + sin 3t 


composed of two sinusoidal waves with different frequencies. The wavelet 
transform T(a,b) of x(t) is plotted in Fig. 14.6. It is clear that the con- 
tribution from the wave with the higher-frequency oscillation appears at a 
smaller a scale. This clearly demonstrates the ability of the wavelet transform 
to decompose the original signal into its separate components. 


14.1.5 Inverse Wavelet Transform 


Similar to its Fourier counterpart, there is an inverse wavelet transforma- 
tion, that enables us to reproduce the original signal x(t) from its wavelet 
transform T(a,b). 


@ Inverse wavelet transform: 
If « € L?(R), then f can be reconstructed by equation 


x(t) = af. af S T(a,b)ba.a(t) (14.7) 
where the equality holds almost everywhere. 


The proof of the equation is based on the lemma below. 


@ Parseval identity for wavelet transform: 
Let T;(a,b),T,(a,b) be the wavelet transform of f(t),g(t) € L?(R), 
respectively, Pe onied with the wavelet w(t). Then we have 


14.1 Continuous Wavelet Analyses 457 


Salar ff " m oo ’ 
i ah dbT (a, b) Ty (a, b) = Cw is f(t)g(t)*dt. (14.8) 


This identity is derived in Exercise 4. We are now ready to prove the inverse 
transformation (14.7). 


Proof (of the inverse wavelet transformation): Assume an arbitrary 
real function g(t) € L?(R). It follows from the Parseval identity that 


or i f(t)g(t)dt = [. af “21 y(a,6)T4(a,2) 
-f a i “7 (0,0) fe * glt)ba (that 
ae dtg(t) ie af S T(a,0a (0) 


Since g(t) is arbitrary, the inverse equation (14.7) follows. d& 


14.1.6 Noise Reduction Technique 


Suppose that the inverse transformation equation (14.7) is rewritten as 


x*(t) = a [. db ane b)Wa,v(t), 
WV a* 

the integration range with respect to a in an interval [a*,oo) with a* > 0. 
Then, the result «*(t) obtained on the left-hand side deviates from the original 
signal x(t) owing to the lack of information for the scale from a = 0 to a = a*. 
In applications, this deviation property is made use of as a noise reduction 
technique. 

By way of a demonstration, Fig. 14.7a illustrates a segment of the signal 


x(t) = sint + sin 3t + R(t) 


constructed from two sinusoidal waveforms plus a local burst of noise R(t). The 
transform plot of the composite signal shows the two constituent waveforms 
at scales a, = 7/2 and ag = 7/6 in addition to a burst of noise around b = 5.0 
in a high-frequency region (i.e., small a scale). 

Now we try to remove the high-frequency noise component by means of 
the following reconstruction procedure. Figure 14.7b shows a reconstruction 
of the signal where we artificially set T(a,b) = 0 for a < a*. In effect, we are 
reconstructing the signal using 


v(t =a f- wf S T(a, b)ba,(t), 


458 14 Wavelet Transformation 


(a) 24 


Fig. 14.7. Noise reduction procedure through wavelet transformation. (a) A sig- 
nal x(t) = sint + sin3t + R(t) with a local burst of noise R(t). (b) The wavelet 
transform T(a,b) of the 2(t). Noise reduction is accomplished through the inverse 
transformation of the T'(a, b) by applying an artifical condition of T(a < a*,b) = 0. 
(c) The reconstructed signal «*(t) from the noise-reduction procedure 


i.e., over a range of scales [a*,00). The lower integral limit, a*, is the cut-off 
scale indicated by the dotted line in Fig. 14.7b. As a result, the high-frequency 
noise component evidently reduces in the reconstructed signal as shown in 
Fig. 14.7c. This simple noise reduction method is known as scale-dependent 
thresholding. 


Exercises 


1. Show that the Fourier transform of the Haar wavelet satisfies the admissible 
condition (14.1). 


Solution: The Fourier transform W(w) of the Haar wavelet y(t) 
is given by 


14.1 Continuous Wavelet Analyses 459 


1/2 1 4 
V(w) = i et dt -| ett = je te! sin *(w/ ) 
0) 1/2 w/4 


Hence, we have 
°° |W ; 4 
cr= f POF aw =16 f Sn I pies, 
0 WwW 0 w 
2. Prove that the Fourier transform of qq,5(t) yields Wa,p(w) = /ae~"’W (aw). 


Solution: It readily follows that 


Se es ae 


Set u = (t — b)/a in the last integral to obtain 


Vap(w =f ete (autb)ab(a Jadu = V/ae ~ PG (aw), & 


3. Let u(t) be a wavelet and ¢(t) be a real, bounded, and integrable function. 
Show that the convolution w « ¢@ is also a wavelet. 


Solution: We first show that w * ¢ € L?(R). Observe that 


w(t) « a)? = | f[e- u)o(u)dal 


= | i. * U(t=w) ow)? atwy'/au) 


w(t —u)? o(u)du o(u!)du’. 
< fo fio 


The integral fee o(u’)du’ is a constant, denoted by A. Integrate 
both sides with respect to t to obtain 


[wo ssepars a [oy] [vowed] au 
= A [udu fv(erat = A [vera Ste 


which clearly indicates that 7) * 6 € L?(R). Next we show that the 
convolution w * @ satisfies the admissibility condition. In fact, 


ficial ee OLOless 


= / PON sup |B(w)|Pdw < oo. 


—oo 


These two results implys that the convolution 7 * @ is a wavelet. & 


460 14 Wavelet Transformation 
4. Derive the Parseval identity for the wavelet transform (14.8). 


Solution: The transform T;(a,b) reads 


Ty(a,8) =f fQWe()ae= 5 f Fw)Vae™(au)d 


— on aC 
where we used the fact that Y»(w) = ae~”’W(aw). Similarly, 
de ES 
we have T,(a, b) = =| G(w)/ae~*”’ Y(aw)dw. Hence, we have 
T Jao 


co d co 
i 3 / dbT; (a, b)Ty(a, b) 
oo foe) oo oo —ib(w+w’) 
=| al a | aw | dhe! yop Fle) Ge") (aw) Wau) 


~ i 7 : i i a i 7 dus! F(w)G(w! Wau) (au')5(w + w") 


1 [da f* 
= ae dwF (w)G(—w)Y (aw) (—aw). 


Since w(t) and g(t) are both real, Y(—aw) = W(aw)* and G(—w) = 
G*(w). Thus we have 


| “ / _ AT j(a,)T,(a.b) e(2)| 


. dw F(w)G* (w) [ ———dz 


27 Joo x 


= cr [ sootear, 


where x = aw. This completes the proof. & 


I 


14.2 Discrete Wavelet Analysis 


14.2.1 Discrete Wavelet Transforms 


Having discussed the continuous wavelet transform, we move on to its discrete 
version, known as the discrete wavelet transform. In many applications, 
data are represented by a finite number of values, so it is important and often 
useful to consider the discrete version of a wavelet transform. We also can use 
an efficient numerical algorithm, called the fast wavelet transform, which 
allows us to compute the wavelet transform of the signal and its inverse quite 
efficiently. 

We begin with the definition of a discrete wavelet. In the previous sec- 
tion, the wavelet function was defined at scale a and location b as 


14.2 Discrete Wavelet Analysis 461 


Va,o(t) = at (‘ = *) ; 


in which the values of parameters a and b can change continuously. We now 
want to discretize the values of a and b. One possible way to sample a and b 
is to use a logarithmic discretization of the a scale and link this to the size of 
the steps taken between b locations. This kind of discretization yields 


m 
Ym,n(t) = HY (“ es ) (14.9) 
0 

where the integers m and n control the wavelet dilation and translation 
respectively; ao is a specified fixed dilation step parameter and bg is the lo- 
cation parameter. In the expression (14.9), the size of the translation steps, 
Ab = boaj", is directly proportional to the wavelet scale, aj’. 

Common choices for discrete wavelet parameters aj and bo are 1/2 and 
1, respectively. This power-of-two logarithmic scaling of the dilation steps is 
known as the dyadic grid arrangement. Substituting a9 = 1/2 and bo = 1 
into (14.9), we obtain the dyadic grid wavelet represented by 


Pmn(t) = 2/2ep (2¢ — n). (14.10) 


Using the dyadic grid wavelet of (14.10), we arrive at the discrete wavelet 
transform of a continuous signal x(t): 


@ Discrete wavelet transform: 


bere A (t)bmn(t)dt = ie a(t)2”/2p (2"t—n)dt. (14.11) 


—1e31 —co 


Remark. Note that the discrete wavelet transform (14.11) differs from the 
discretized approximation of the continuous wavelet transform given by 


T(a,b) = / 7 x(t)br ,(t)dt ~ s x(LAt)pr ,(LAt) At. (14.12) 


l=—oo 


In (14.12), the integration variable t is discretized, and a and b are continuous 
whose values can be arbitrarily chosen. On the other hand, in the discrete 
wavelet transform (14.11), a and b are discretized and t remains continuous. 


462 14 Wavelet Transformation 
14.2.2 Complete Orthonormal Wavelets 


The fundamental question is whether the original signal x(t) can be constructed 
from the discrete wavelet transform T;,,,, through the relation 


a) = SE an ©: (14.13) 


M=— CO N=—CO 


As intuitively understood, the reconstruction equation (14.13) is justified if 
the discretized wavelets Wm,»(t) are orthonormal and complete. The com- 
pleteness of Wm,n(t) implies that any function x € L?(R) can be expanded by 


wG) S55. Gating’) (14.14) 


M>=— WO N=—CO 


with appropriate expansion coefficients Cm. Hence, the orthonormality 


‘ Vm n(t) Ym! ni (t)dt = bmn Om! n! (14.15) 


results in Cmn = Tmn in (14.14) because 


[oe) 


Peel Ong ae = os a Cm! nt Vm! wn (t) Wm v(t) dt 


—oo °° Lm!=—0o n!=—0o 


OD cm / Unt) brat nr (t)at 


m!=—oo n!=—0o 


oe) lo) 
= s s Cm! nr! Om,n Om! ni = Cm,n-: 


m!=—0o n!=—0o 


In general, however, the wavelets Wm »(t) given by (14.9) are neither orthonormal 
nor complete. We thus arrive at the following theorem: 


@ Validity of the inverse transformation formula: 

The inverse transformation formula (14.13) is valid only for a limited 
class of sets of discrete wavelets {Wm (t)} that is endowed with both or- 
thonormality and completeness. 


The simplest example of such desired wavelets is the Haar discrete wavelet 
presented below. 


Examples The Haar discrete wavelet is defined by 


Wan (t) = 2/7 b(2™t — n), 


14.2 Discrete Wavelet Analysis 463 


where 
1 0<t<1/2, 
wt)=4 -1 1/2<t<1, 
0 otherwise. 


This wavelet is known to be orthonormal and complete; its orthonormality is 
verified in Exercise 1. 


14.2.3 Multiresolution Analysis 


We know from Sect. 14.2.2 that in order to use equation (14.13), we must find 
an appropriate set of discrete wavelets {¢mn} that possess both orthonormal- 
ity and completeness. In the remainder of this section, we describe a frame- 
work for constructing such discrete wavelets that is based on the concept of 
multiresolution analysis. 

Multiresolution analysis involves a particular class of a set of function 
spaces. The greatest peculiarity is that it establishes a nesting structure of 
subspaces of L?(R) that allows us to construct a complete orthonormal set of 
functions (i-e., an orthonormal basis) for L?(R). The resulting orthonormal 
basis is simply the discrete wavelet Wm,,(t) that yields the reconstruction 
equation (14.13). 


@ Multiresolution analysis: A multiresolution analysis involves a set 
of function spaces that consists of a sequence {V; : j € Z} of closed 
subspaces of L?(R). Here the subspaces V; satisfy the following conditions: 


ee ee VSS Vine Vic Vie Va. anc bal Pee 

2. lees Vj = {0}. 

3. f(t) € V; if and only if f(2t) € V;4, for all integers j. 
4 


. There exists a function $(t) € Yo such that the set {¢(t—n), n € Z} 
is an orthonormal basis for Vo. 


The function ¢(t) introduced above is called the scaling function (or father 
wavelet). It should be emphasized that the above definition gives no informa- 
tion as to the existence of (or the way to construct) the function ¢(t) satisfying 
condition 4. However, once we find such a function ¢(t), we can establish a 
multiresolution analysis {V;} by defining the function space Vo spanned by 
the orthonormal basis {¢(t — n), n € Z} and then forming other subspaces 
V; (7 4 0) successively by using the property denoted in condition 3. If this is 
achieved, we say that our scaling function ¢(t) generates the multiresolution 
analysis {V; }. 


464 14 Wavelet Transformation 


Remark. There is no straightforward way to construct a scaling function ¢(t) 
or, equivalently, a multiresolution analysis {V;}. Nevertheless, many kinds of 
scaling functions have been discovered by means of sophisticated mathemat- 
ical techniques. Here we omit the details of the derivations and just refer to 
the resulting scaling function at need. 


Examples Consider the space Vm of all functions in L?(R) that are constant 
in each interval [2-"n,2~™(n + 1)] for all n € Z. Obviously, the space Vi», 
satisfies conditions 1-3 of a multiresolution analysis. Furthermore, it is easy 
to see that the set {¢(t — n), n € Z} depicted in Fig. 14.8, which is defined 
by 


1, O<t<1l, 
g(t) = {a otherwise, ats) 
satisfies condition 4. Hence, any function f € Yo can be expressed by 


with appropriate constants c,. Thus, the spaces V,, consist of the multireso- 
lution analysis generated by the scaling function (14.16). 


14.2.4 Orthogonal Decomposition 


The importance of a multiresolution analysis lies in its ability to construct an 
orthonormal basis (i.e., a complete orthonormal set of functions) for L?(R). 


¢(2t-1) 
g(t-1) 
-1 -12 0 12 1 32 2 
t ; 
-1 -12 0 12 1 32 2 9(2t) 


ot) -l -12 0 1/2 1 3/2 2 


| o(2t+1) 
TT OO S —_ > 
-1 -1/2 0 1/72 1 3/2 2 | : 


-1 -12 0 1/2 1 3/2 2 


Oe) o(2tH2) 


ee ee eg 4, a 0 4 4 30 3 


(a) Orthonormal basis for 1 (b) Orthonormal basis for 1, 


Fig. 14.8. Two different sets of functions: Vo and V; 


14.2 Discrete Wavelet Analysis 465 


In order to prove this statement, we first recall that a multiresolution analysis 
{V;} satisfies the relation 


Vo © VY, Vo © araiiar L?. 


We now define a space Wp as the orthogonal complement of Vo and V}, 
which yields 
VY, = Vo B Wo. (14.17) 


The space Wo we have introduced is called the wavelet space of zero order: 
the reason for the name is clarified in Sect. 14.2.5. The relation (14.17) extends 
to 


V2=Vi BW, =VWeW OW (14.18) 


or, more generally, it gives 


L?=V.. =VoDW OW OW2E:::, (14.19) 


where Vo is the initial space spanned by the set of functions {¢(t — n), n € 
Z}. Figure 14.9 illustrates the nesting structure of the spaces V; and W, for 
different scales j. 
Since the scale of the initial space is arbitrary, it can be chosen at a higher 
resolution such as 
LD? =Vs 0DWs DWE O:-: ; 


or at a lower resolution such as 
LI? =V_309W_39W_20°:: ) 


or even at negative infinity, where (14.19) becomes 


Fig. 14.9. Hierarchical structure of the spaces Vj; and W; as subspaces of L? 


466 14 Wavelet Transformation 


LP=---PBWidPWOW9:::. (14.20) 


The expression (14.20) is referred to as the orthogonal decomposition of 
the L? space and indicates that any function 2 € L?(R) can be decomposed 
into the infinite sum of g; € Wj: 


a(t) =--++9-1(t) + go(t) + git) +-°-- (14.21) 


14.2.5 Constructing an Orthonormal Basis 


Let us further examine the orthogonal property of the wavelet spaces {Wj}. 
From (14.17) and (14.18), we have 


Wo CV, and W, C Vo. 
In view of the definition of the multiresolution analysis {V;}, it follows that 
fQ}cvi = f(t) cv, 


sO 
f(t)eWo <> f(Qt)ewM. (14.22) 


Furtheremore, condition 4 in Sect. 14.2.3 results in 
fthewWo <= f(t—n) €W for any neé Z. (14.23) 


The two results (14.22) and (14.23) are ingredients for constructing the or- 
thonormal basis of L?(R) that we are looking for, as demonstrated 
below. 

We first assume that there exists a function 7(t) that leads to an orthonor- 
mal basis {(t — n), n € Z} for the space Wo. Then, if we use the notation 


Won(t) = v(t-—n) EW, 
it follows from (14.22) and (14.23) that its scaled version defined by 
din(t) = V2y(2t — n) 


serves as an orthonormal basis for W,. The term /2 was introduced to keep 
the normalization condition 


[. Wo,n(t)*dt = fo W1n(t)*dt = 1. 


By repeating the same procedure, we find that the function 


Wm n(t) = 2/2p(2™t — n) (14.24) 


14.2 Discrete Wavelet Analysis 467 


constitutes an orthonormal basis for the space W,,,. Applying these results to 
the expression (14.21), we have for any x € L?(R), 


n(t) = +--+ galt) + .90(t) + lt) ++ 
Sle ye Cin) + y Co,nWo,n(t) +e S- Cin V1,n(t) aes 


= DF OD cmntmnlt). (14.25) 


M=—O N=—OCO 


Hence, the family wm n(t) represents an orthonormal basis for L?(R). The 
above arguments are summarized by the following theorem: 


@ Theorem: 

Let {V;} be a multiresolution analysis and define the space Wo by Wo = 
V,\Vo. If a function y(t) that leads to an orthonormal basis {w(t—n), n € 
Z} for Wo is found, then the set of functions {Wmn, m,n € Z} given by 


Wm, n(t) = 2! 2ap(2™t — n) 


constitutes an orthonormal basis for L?(R). 


Emphasis is placed on the fact that since Wmn(t) is the orthonormal basis 
for L?(R), the coefficients Cm, in (14.25) are identical to the discrete wavelet 
transform Tin given by (14.11) (see Sect. 14.2.2). Therefore, the function 
w(t) we introduce here is identified with the wavelet in the framework of 
continuous and discrete wavelet analysis, such as the Haar and the Mexican 
hat wavelets. In this sense, each W,, is referred to as the wavelet space and 
the function 7(t) is sometimes called the mother wavelet. 


14.2.6 Two-Scale Relations 


The preceding argument suggests that an orthonormal basis {wm} for L?(R) 
can be constructed by specifying the explicit function form of the mother 
wavelet ¢(t). Thus the remaining task is to develop a systematic way of 
determining the mother wavelet w(t) that leads to an orthonormal basis 
{w(t — n) n € Z} for the space Wo = V1\Vo contained in a given mul- 
tiresolution analysis. We shall see that the 7(t) can be found by examin- 
ing the properties of the scaling function ¢(t); we should recall that (t) 
yields an orthonormal basis {¢(t — n) n € Z} for the space Vo. (In this 
context, the space V; is sometimes referred to as the scaling function 
space.) 


468 14 Wavelet Transformation 


In this subsection, we make reference to an important feature of the scal- 
ing function ¢(t) called the two-scale relation, which plays a key role in 
constructing the mother wavelet 7(t) of a given multiresolution analysis. We 
already know that all the functions in V,, are obtained from those in Vo 
through scaling by 2”. Applying this result to the scaling function denoted 
by 

don(t) = o(t —n) € Vo, 
we find that 
dmn(t) = 27/76(2"t—n), meZ (14.26) 


is an orthonormal basis for V,,. In particular, since ¢ € Vo C V; and ¢1,,(t) = 
V2(2t — n) is an orthonormal basis for V,, ¢(t) can be expanded by ¢1,»,(t). 
This is formally stated in the following theorem: 


@ Two-scale relation: 
If the scaling function ¢(t) generates a multiresolution analysis {V;}, it 
satisfies the recurrence relation: 


Co 


C= 9h Piel 2) pao), (14.27) 


n=—CcoO n=—Cco 


where 


re / ” $(8)b1n(8)at. (14.28) 


This recurrence equation is called the two-scale relation of $(t) and the 
coefficients p, are called the scaling function coefficients. 


Remark. The two-scale relation is also referred to as the multiresolution 
analysis equation, the refinement equation, or the dilation equation, 
depending on the context. 


Examples Consider again the space V,,, of all functions in L?(R) that are 
constant on intervals [27™n,27~™(n + 1)] with n € Z. This multiresolution 
analysis is known to be generated by the scaling function ¢(t) of (14.16). 
Substituting (14.16) into (14.28), we obtain 


1 
=p; = — and p, = 0 forn£0,1. 
Po = P1 V2 Pp # 


Thus the two-scale relation reads 


(t) = G(2t) + O(2¢ — 1). 


14.2 Discrete Wavelet Analysis 469 


This means that the scaling function ¢(t) in this case is a linear combination 
of its contracted versions as depicted in Fig. 14.10. 


ot) g(2r) g(2t—1) 
a = Ed Sele, 
0 12 1 0 12 1 0 Ww 1 


Fig. 14.10. Two-scale relation of ¢(t) 


14.2.7 Constructing the Mother Wavelet 


We are now in a position to determine the mother wavelet w(t) that enables 
us to establish an orthonormal basis {wW(t—n), n € Z} for L?(R). Recall that 
a mother wavelet 7(t) = wWo,0(t) resides in a space Wo spanned by the next 
subspace of the scaling function V1, i.e., Wo C V1. Hence, in the same context 
as in the previous subsection, W(t) can be represented by a weighted sum of 
the shifted scaling function ¢(2t) by 


co 


v(t)= S> qnV29(2t—n), ne Z. (14.29) 


n=—Cco 
The expansion coefficients g,, are called wavelet coefficients and are given 
by 
dn = (-1)" pnt (14.30) 


as stated below. 


@ Theorem: 
If {V,,} is a multiresolution analysis with the scaling function ¢(t), the 
mother wavelet 7(t) is given by 


wt) =VE Oty ag@t—n), neZ, (1481) 


n=—Cco 


where p,, is the scaling function coefficient of ¢(t). 


Remember that p, in (14.31) is uniquely determined by the function form 
of the scaling function ¢(t); See (14.28). Thus the above theorem states that 
the mother wavelet 7)(t) is obtained once the scaling function ¢(t) of a given 
multiresolution analysis is specified. 


470 14 Wavelet Transformation 


Remark. The relation gq, = (—1)"p1-n employed in equation (14.31) is one 
possible choice for constructing the mother wavelet w(t) from the father 
wavelet ¢(t). In fact, there are alternative choices such as 


dn = (-1)"pi-n 


or 
dn = (-1)"""pen-1-n 

with certain N € Z. Hence, the mother wavelet ~(t) associated with a given 

multiresolution analysis is not unique. In practice, however, any preceding 

definition of g, can be used to obtain a mother wavelet w(t) because it leads 

to an orthonormal basis for the space Wo. 


The proof of equation (14.31) requires the following two lemmas: 


@ Lemma 1: 
The Fourier transform ®(w) of the scaling function ¢(t) satisfies 


where M(w) is the generating function of the multiresolution anal- 
ysis defined by 
i= 
M(w) = — ems war 14.32 
(w) aE BOG: ( ) 
with the scaling function coefficient p, of d(t). 
@ Lemma 2: 
The Fourier transform F(w) of any function f € Wo can be expressed 
by 


F(w) = V(w)e/2M* (= +1) ® (5) (14.33) 


where V(w) is a 27-periodic function, i.e., V(w) = V(w + 27). 


We should keep in mind that V(w) is the only term on the right-hand side of 
(14.33) that depends on f(t); the remainder term e’”/?M*[(w/2) + 1]®(w/2) 
is independent of f(t). The proofs of the two lemmas are outlined in Exercises 
3 and 4. Now we turn to a proof of equation (14.31) for the construction of 
the mother wavelet ~(t) from the scaling function ¢(t). 


Proof (of Theorem): Since the mother wavelet y(t) gives an orthonormal 
basis {w(t — n), n € Z} for the space Wo, any function f € Wo can be 
expressed by 


fO= Yo havlt—-n) 


n=—Cco 


14.2 Discrete Wavelet Analysis A471 
with appropriate coefficients h,. Its Fourier transform F'(w) reads 
co 
= ( Dy ine) Vw), 
n=—oo 


where the sum in parentheses is 27-periodic. Comparing this with (14.33), we 
obtain a 7 
W(w) = e/2 (= a r) r (5) (14.34) 


Substituting expression (14.32) into (14.34) yields 


= einl(w/2)+71g o(%) 


1 eae wW 
= y int Un+])(w/2)6 (5) 
v2 oF 2 


eiw/2 


V(w 


= Fy De Pobre (5) PSea: 


Take the inverse Fourier transform of the both sides to find 


= 1 = -1 i e? w/26 iwt he 

WO = Fe DS Pmal 1k / hs o(2)a 
— 2 = “s -1 ie! (2t—k)t Le) w =w 
= Fp Pool yk i: : NB(w!)du! [uw = w/2| 


= V2 > p-ea(-l)P 62 &). 
This is our desired result (14.31). & 


14.2.8 Multiresolution Representation 


Through the discussions thus far, we have obtained an orthonormal basis 
consisting of scaling functions ¢;,(t) and wavelets ~,;,(t) that span all of 
L?(R). Since 

L? = Vj, ® Wi, ® Win 41 O°: ’ 


any function x(t) € L?(R) can be expanded, e.g., 


=> Sio,k Pio, k( y 3 Tj. 4eVy.t (14.35) 


k=—b0 k=—0o j=Jo 


Here, the initial scale jg could be zero or another integer or negative infinity 
as in (14.13), where no scaling functions are used. The coefficients T;,, are 


472 14 Wavelet Transformation 


identified with the discrete wavelet transform given in (14.11). Often Tj), in 
(14.35) is called the wavelet coefficient and S;, is called the approxima- 
tion coefficient. 

The representation (14.35) can be simplified by using the following no- 
tation. We denote the first summation on the right-hand side of (14.35) by 


co 
aj(t)= > Sio,nbjo,n(t)- (14.36) 
k=—oo 
Equation (14.36) is called the continuous approximation of the signal x(t) 
at scale jo. Observe that the continuous approximation approaches x(t) in the 
limit of jg — oo, since in this case L? = V,,. In addition, we introduce the 
notation 


a(t) = D7) Tixbix(®), (14.37) 
k=—0o 
where z;(t) is known as the signal detail at scale j. With these conventions, 
we can write (14.35) as 


co 
a(t) = 2,,(t)+ >> z(t). (14.38) 
I=JO 
Equation (14.38) says that the original continuous signal x(t) is expressed as 
a combination of its continuous approximation x;, at an arbitrary scale index 
jo added to a succession of signal details z;(¢) from scales jo up to infinity. 
Also noteworthy is the fact that due to the nested relation of Vj;+1 = 
Vv; 6 W;, we can write 


rj41(t) = 2;(t) + 2;(2). (14.39) 


This indicates that if we add the signal detail at an arbitrary scale (index 
j) to the continuous approximation at the same scale, we get the signal ap- 
proximation at an increased resolution (i.e., at a smaller scale, index j + 1). 
The important relation (14.39) between continuous approximations x,(t) and 
signal details z;(t) is called a multiresolution representation. 


Exercises 


1. Verify the orthonormality of the Haar discrete wavelet tm n(t) defined by 
Um n(t) = 2/2ap(2™+t — n), where 


1 0<t<1/2, 
vit)=4 -l 1/2<t<1, 
0 otherwise. 


14.2 Discrete Wavelet Analysis 473 
Solution: First we note that the norm of qm n(a) is unity: 
i Vn.n(t) eal [dmn,n(27™t — n)]” dt 
Soom om [. Wm nl = 
Thus, we obtain 
Ps if 7 Um n(t)Wra(t)dt = / 7 Ql 2 h(a" t= Od 
= gman in ap(u)2—*/2p[2™-*(u + n) — Edt. (14.40) 


If m = k, the integral in the last line in (14.40) reads 
[euler n~ Ot = bon-8 = Se 


since w(u) AOinO0<u<landy(utn—-f) F0inl-n<uK< 
£—n-+1, so that these intervals are disjoint unless n = ¢. Owing 
to symmetry if m # k, it suffices to look at the case of m > k. Set 
r=m-—k #0 in (14.40) to obtain 


= 2r/? [« p(u)(2"v + s)du 


FS 


1 
=2Q"/? fa p(2"u + s)du — p(2"u + s)du 
0 1/2 


which can be simplified as 


I= [ ewe = [vee =0, (14.41) 


where 2"u+s =2,a=s8+2"-!,b=s8s+2". Observe that [s, a] con- 
tains the interval [0,1] of the Haar wavelet w(t), which implies that 
the first integral in (14.41) vanishes. Similarly, the second integral 
equals zero. We thus conclude that 


f= / Vm,.n(t)Wr,edt = Om,kOn, £5 


which means that the Haar discrete wavelet mm ,»(t) is orthonormal. 


oe 


A474 14 Wavelet Transformation 


2. Let ¢ € L?(R) and G(w) be the Fourier transform of ¢(t). Prove that 
the system {don = o(t—n), nm € Z} is orthonormal if and only if 
|P(w + 2kr)|? = 1 almost everywhere. 

Solution: It is obvious that the Fourier transform of ¢0,n(t) 
reads Bp ,(w) = e~'”’&(w). In view of the Parseval identity for 
the wavelet transform (14.8), we have 


he d0,n(t)b0,m(t)dt = ie 0,0(t)¢0,m—n(t)dt 


1 


mires ee 


= a ar )Pom—n(w)dw enim ™)” [Bo o(w)]? dw 


Qn 

1 % i i 3 
=i e WP" Tha o(w)]> dw 
21 ants 27k 

1 27 co 2 
a5, eTim=ny NS [Bo,0(w)]” dw. 


k=—0o 


It thus follows from the completeness of {e~’"’, n € Z} in 
L7(0,2m) that [°° ¢0,n(t)¢0,m(t)dt = 0 if and only if 


SS [®o,0(w)]” =1 almost everywhere. & 


k=—0o 


3. Let &(w) be the Fourier transform of the scaling function (¢) and let py, 
be its scaling function coefficient. Prove that 


@(w) = M (=) o(<) with M(w oe pre. — (14.42) 
Solution: Since 4(t) = V2 7°... pnd(2t — n), we have 


=/2 DB pm fo b(2t — n)e~* dt 


or: b(t’ e WET /2d¢ (t= 2t—n) 
= FY memo (2) =u (S)o(2). 


4. Let f(t) be a function f € Wo = Vi\Vo for a given multiresolution analysis 
{V;}. Prove that its Fourier transform F(w) necessarily takes the form 


14.2 Discrete Wavelet Analysis 475 


F(w) = V(w)e/2M* (= aS r) ’ (5) (14.43) 
where V(w) = V(w + 27). 
Solution: Since f € Wo and Vy = WOW), it follows that 
f € V, and is orthogonal to Vo. Hence, we can write f(t) = 
V2" _o, CnPin(t) = Ne een Cn@(2t — n), where c, = 
ie f(()¢1,,(t)dt. Take the Fourier transform of both sides to 
obtain 


F(w) = My (5) @ (5) with M;(w) = = x et ee 


(14.44) 


Evidently, M(w) is a 27-periodic function belonging to L*(0, 27). 
Since f is orthogonal to Vo, we have [™. F(w)®*(w)e™’dw = 0, 


sO 
/ 7 SS) F(w + 2kr)O*(w + 2kr) 


—oo 


dw = 0. 


k=—0o 


Consequently, 77°... F(w + 2km)&*(w + 2k) = 0. Substituting 
(14.42) and (14.44) into this result, we obtain 


com ($+ An) a ($+hr) |o(F +en)| =o. 


Meanwhile we denote M;(w)M*(w) and |®(w)|? by Mo(w) and 
2(w), respectively. By splitting the sum into even and odd integers 
k and then employing the 27-periodicity of M(w) and My(w) [and 
thus M2(w)], we have 


ie x Mp (= + 2kr) , (S + 2m) 


k=—oo 


ae 3 Mp [= + (2k + Vr Po E + (2k + Vr 


= M, (5) +My (= +r) (14.45) 


476 14 Wavelet Transformation 


where we used the orthonormality condition with respect to the set 
of scaling functions {¢o,,(t)}. Finally, replacing w in the last line 
in (14.45) by 2w gives 


My(w) M*(w +7) 


—My(wt+7) M*(w) = ae) 


which indicates the linear dependence of two vectors: [M;(w), —My 
(w+ 7)| and [M*(w + 7), M*(w)]. Hence, there exists a function 
A(w) such that 

My(w) = A(w) M*(w + 7). (14.47) 


Since both M and My are 27 periodic, so is 4. Further, substituting 
(14.47) into (14.46) yields 


Aw) + A(w + 7) =0, (14.48) 
which means that there exists a function V(w) such that 
Mw) =e’V(w) and V(w) =V(w+ 2m). 


Eventually, the results (14.44), (14.47), and (14.48) lead to the 
desired representation (14.43). & 


14.3 Fast Wavelet Transformation 


14.3.1 Generalized Two-Scale Relations 
We know that a signal x(t) € L?(R) can be represented in terms of the 
continuous approximation Sj, and the discrete wavelet transform Ty,» by 


Co 


x(t) = a Smo .nPmo,n(t) + Ds s TmnPmn(t), 


where 

bmn (t) = 2°/76(2™t —n) and tmn(t) = 2™/2y(2Z™t—n). (14.49) 
[See (14.24) and (14.26).] In principle, both expansion coefficients S,,, and 
Tm,n Can be computed through the convolution integral defined by 


CO 


Smn = / i‘ L(t)omm(t)dt and Trn = / £(t)mn(t)dt. (14.50) 


—oo —c 


Actual computations of these integrals are very time-consuming. However, 
there is an efficient method for computing Sp, and Ty, at all m, known as 


14.3 Fast Wavelet Transformation A477 


the fast wavelet transform. This sophistuated method is based on recursive 
equations for Sy, and T,, and thus is markedly suitable for numerical 
computations of wavelet analyses. 

To proceed with the argument, we need some preliminary results. We know 
that the father wavelet #(t) and the mother wavelet ~(t) can be described by 
a linear combination of contracted and shifted versions of $(t) as follows: 
o(t) = V2 N° peo(2t—k) and o(t)= S> (-1)*pi-nG(2t — k), 


k=—co k=—oco 
where p,, is the scaling function coefficient of o(t). For convenience, we use an 


alternative definition g, = (—1)"pi_n of the wavelet coefficient g,, instead of 
the one used in (14.30). These facts immediately result in 


Oral) = 27 (2 =H) Sa? ss pro [2 (2"t — n) — kl 
k=—0o0o 
=z pie 2 peo D/2 gs ons e(E) 
k=—0oo 
= ie 3 PrOm+1,2n+k(t), (14.51) 
k=—oo 
and similarly, 
Vm.n(t) = ae s dk Om+1,2n+k(t). (14.52) 
k=—oco 


The expressions (14.51) and (14.52) are generalizations of (14.27) and (14.31) 
applicable for #(t) and ¢(t). 


@ Generalized two-scale relations: 
Given a multiresolution analysis, @mn(t) and %mn(t) are obtained from 
the set of functions {¢m4in+K(t); —co < k < co} by 


Onn i= tee aa) Deore sae): 
k=—00 
dmm(t) = 2-1? Se dkOm-+i2n+k(t). 


k=—00 


478 14 Wavelet Transformation 
14.3.2 Decomposition Algorithm 


The fast wavelet transform consists of two main parts, called, respectively, 
the decomposition algorithm and the reconstruction algorithm, each 
of which gives a recursive relation between approximation coefficients Sip, 
and wavelet coefficients T;,,, at neighboring scales. This subsection focuses 
on the former algorithm and in the next subsection deals with the latter. 


Remark. In the literature about the fast wavelet transform, all of the terms 
below mean the same thing: 


- discrete wavelet transform 

- decomposition/reconstruction algorithm 
- fast orthogonal wave transform 

- multiresolution algorithm 

- pyramid algorithm 

- tree algorithm 


The decomposition algorithm enables us to obtain Sp, and Tm» at all m 
smaller than a prescribed scale mo, once Smo,n is given. To attain our objec- 
tive, we first derive a recursive formula for S,,,, at two different scales, i.e., 
Smn and Sm+1,n. From the expansion (14.49) and from the orthonormality 
of dm,.n(t), it follows that 


Sn je x(t)dmn(t)dt. 


Using the generalized two-scale relation (14.51), we can write 


Smyn = - x(t) ls > Peder ane dt 


= 5 [fs emsronsalat 


— 60 —oo 


1 


= 2 ye PrSm+1,2n-+k- 


k=—0o 


Replacing the summation index k with k — 2n, we obtain 


WH 


l oe) 
m,n — V2 ye Pk—2nSm-41,ks (14.53) 


k=—0o 


which provides the approximation coefficients Sm,» from Sm+1.n- 
Similarly the wavelet coefficients Ti,,, can be found from the approxima- 
tion coefficients at the previous scale: 


14.3 Fast Wavelet Transformation 479 


Dey 2 dk Sm-+1, 2n+k = dk—-2n Sm+1, ke (14.54) 
es Ss 


k=—0o k=—0o 


As a consequence, if we know the approximation coefficients S;,,, 4 at a specific 
scale mp then, through repeated application of (14.53) and (14.54), we can 
generate Sj, and Tn, at all m < mo. This procedure, called the decom- 
position algorithm, which is based on (14.53) and (14.54) is the first half of 
the fast wavelet transform that allows us to compute the wavelet coefficients 
efficiently, rather than computing them laboriously from the convolution of 
(14.50). 


14.3.3 Reconstruction Algorithm 


We can go in the opposite direction and reconstruct Sm+ijn from Sm, and 
Tmn- We already know from (14.39) that am41(t) = %m(t) + 2m(t), and we 
can expand this as 


Lm+i(t ee Sm nm nt) 25; in nYmn(t) 


n=—Co n=—Co 


Furthermore, using (14.51) and (14.52), we can expand this equation in terms 
of the scaling function at the previous scale: 


Lm+1(t > Sune s PkOm+4i, antk(t) 


n=—0o k=—0o 


+ a Ima 2 dkOm+1,2n+k(t). 


n=—0co k=—0o 


Rearranging the summation indices, we get 


Lm4i(t = Sma >> Pk- andm+1, x(t) 


n=—0o Ve 


+ > Pane aoe dk— 2nPm+1, x(t). 


n=—0o Vege, 


(14.55) 


We also know that we can expand vm _1(t) in terms of the approximation 
coefficients at scale m — 1, ie., 


Lm4i(t) = 2 Sri bOma a(t). (14.56) 


k=—0o 


Equating the coefficients in (14.56) with (14.55) yields the reconstruction 
algorithm: 


480 14 Wavelet Transformation 


DS 1 
Sm+1jn = V2 ye Pn—2kIm,k Se V2 oy dn—2klm,k, 
k=—0o 


k=—0o 


where we have swapped the indices k and n. Hence, at the scale m+ 1, the 
approximation coefficients S,,+41,, can be found in terms of a combination 
of Smn and Ty,» at the next scale, m. The reconstruction algorithm is the 
second half of the fast wavelet transform. 


Part V 


Differential Equations 


15 


Ordinary Differential Equations 


Abstract The main objective of this chapter is to ensure that the reader under- 
stands the “existence theorem” (Sect. 15.2.3) and the “unique theorem” (Sect. 15.2.4) 
for a first-order ordinary differential equation. These theorems prove the existence 
and uniqueness of a solution of the differential equation and delineate the conditions 
that should be satisfied by the functions that are to be differentiated. 


15.1 Concepts of Solutions 


15.1.1 Definition of Ordinary Differential Equations 


Many physical laws are often formulated as ordinary differential equa- 
tions (ODEs) whose unknowns are functions of a single variable. Below are 
basic notation and several important theorems that are used throughout this 
chapter. We start with the formal definition of ODEs. 


@ Ordinary differential equations: 
An ordinary differential equation of order n is an equation 


F |2,y(2),y/(z),-+- ,y(2)] =0 (15.1) 


that is satisfied by the function y(a) and its derivatives 
y' (a), y"(),--- ,y\™(x) with respect to a single independent variable zx. 


Here, the order of a differential equation means the largest positive integer n 
for which an nth derivative appears in equation (15.1). For instance, a general 
form of the first-order differential equations is given by 


F lx, y(x), y'(e)] =9, (15.2) 


484 15 Ordinary Differential Equations 


where F is a single-valued function on its arguments in some domain D. 
Hereafter we restrict our attention to the case where x is a real number. 


Remark. 


1. An ODE (15.1) is called a linear ODE if it is linear in the unknown 
function y(a) and in all its derivatives; otherwise, it is nonlinear. 

2. A linear ODE of order n is said to be homogeneous if it is of the form 
an(x)y™ + arn —1)(x)y"—D +... + a1(x)y! + ao(x)y = 0, where there is 
no term that contains a function of « alone. 

3. The term homogeneous may have a totally different meaning specifically 
when a linear ODE is first order, which occurs if the ODE is written in 


the form 
Y¥_p(¥ 
= F( ) (15.3) 


Such equations can be solved in closed form by a change of variables 
u = y/x, which transforms the equation into the separable equation 


ie 2 os, (15.4) 


15.1.2 Explicit Solution 


Let y = (x) define y as a function of x on an interval I = (a,b). We say that 
the function y(x) is an explicit solution or a simple solution of the ODE 
(15.1) if it satisfies the equation for every x in I. In mathematical symbols, 
this definition reads as follows: 


@ Explicit solution of an ODE: 
A function y = (a) defined on an interval J is a solution of the ODE 
(15.1) if 


F |2,9(2),¢'(2),--- ,e™(x)] =0 
for every x in I. 
Note that a real function should be a correspondence between two sets of real 
numbers. In this context, if an equation involving x and y does not define 


a real function, then it is not a solution of any ODE even if the equation 
formally satisfies the ODE. For example, the equation 


y=>/-0 $a?) (15.5) 


does not define a real function; therefore, it is not a solution of the ODE 


15.1 Concepts of Solutions 485 
xtyy’ =0 (15.6) 
even though the formal substitution of (15.5) into (15.6) yields an identity. 
Examples 1. The function 
y=logr+c, x>0 
is a solution of y’ =1/a for all x >0. 


2. The function 
2n+1 


y=tanz—2z, «F mw (n= 0,+1,+2,---) (15.7) 


is a solution of 

y =(x+y)’. (15.8) 
In fact, the substitution of y into (15.8) gives the identity tan?2 = 
(x + tana — x)* = tan? x in each of the intervals specified in (15.7). 


Remark. Note that the ODE (15.8) is defined for all x, but its solution 
(15.7) is not defined for all 2. Hence, the interval for which the function 
given by (15.7) may be a solution of (15.8) is a smaller set of the intervals 
in (15.7). 


3. The function y = |z| is a solution of 
y =1 in the interval x > 0, 
and is also a solution of 


y =—1 in the interval x < 0. 


Remark. Observe that the function y = |2| is defined for all x, whereas the 
corresponding ODEs are defined in only a restricted interval of 2, in contrast 
to Example 2. 


15.1.3 Implicit Solution 


It is sometimes not easy (or even impossible) to solve an equation of the form 
g(x,y) = 0 for y in terms of «. However, whenever it can be shown that an 
implicit function does satisfy a given ODE on an interval J, then the relation 
g(x, y) = 0 is called an implicit solution of the ODE. A formal definition is 
given below. 


486 15 Ordinary Differential Equations 


@ Implicit solution of an ODE: 
A relation g(x,y) = 0 is an implicit solution of an ODE 


F |2,y(2),y/(2),-+- ,y(2)] =0 


on an interval J if: 


1. There exists a function h(a) defined on J such that g(x, h(x)) = 0 for 
every x in I. 


Pa, Thi [z, h(x), h'(x),--- ,A\™ (x)| =0 for every x in I. 


Remark. It must be cautioned that g(x, y) = 0 is merely an equation, and it is 
thus never a precise solution of an ODE, as only a function can be a solution 
of an ODE. What we mean in the above definition is that the function h(z) 
defined by the relation g(x, y) = 0 is the solution of the ODE. 


Examples The equation 

g(z,y) = 2? +y?—-25=0 
is an implicit solution of the ODE 

F(a,y,y') = yy +2=0 


on the interval I: —5 < « <5. In fact, the function h(x) = 25 — a? defined 
on I yields 


Flex, h(x), h!(x)] = vB—# ( = ) +2=0 


for every x on I. 


15.1.4 General and Particular Solutions 


We next observe that an ODE in general has many solutions. For example, 
the ODE 


y’ = et 
can be solved as 
y=e"+e, (15.9) 
where c can take any numerical value. Similarly, if 
yf" =e*, (15.10) 


then its solution, obtained by integrating three times, is 


y=e* +e," + cot +63, (15.11) 


15.1 Concepts of Solutions 487 


where ¢1,C2,¢3 can take on any numerical values. Note that both (15.9) and 
(15.11) express infinitely many solutions since, which are constants, the c’s 
can have infinitely many values. Figure 15.1 is a geometrical interpretation of 
this point. Each curve corresponds to a solution (15.11) for cg = —5,1,4 and 
c3 = —2,1,3 while c; = 1 is fixed. 


(-5,3)*, 


(c,,c,)=( 


Fig. 15.1. Family of the infinitely many solutions (15.11) of the differential equation 
(15.10). Solid and dotted curves correspond to cg = 1 and c2 = 3, respectively 


The two examples above illustrate that solutions of an ODE may often 
be represented by a single equation involving an arbitrary constant c. Such 
a function involving an arbitary constant is called a general solution (or 
complete integral or primitive integral) of an ODE. Geometrically, these 
are infinitely many curves, one for each set of values of the c’s. If we choose 
specific values of the c’s, we obtain what is called a particular solution of 
that ODE. 


Remark. From the examples above, the reader might assume that 


(i) an ODE always has infinitely many solutions, or that 
(ii) a solution of an nth order ODE always contains n arbitrary constants. 


However, these two conjectures are false. For instance, 


e The equation (y’’)? + y? = 0 has only one solution y = 0 that possesses 
no arbitrary constant. 
The equation |y’| + 1 =0 has no solution. 
The first-order equation (y’ — y)(y’ — 2y) = 0 has the solution (y — 
ce”) (y — cae”) = 0 that has two (not one) arbitrary constants. 


488 15 Ordinary Differential Equations 
15.1.5 Singular Solution 
Consider an ODE of the form 
y—sy =fy'), (15.12) 


which is known as a Clairaut equation. We solve it by differentiating both 
sides to yield 
y" [f'(y') + 2] = 0. 


We thus have two possibilities. If we set y” = 0, then y = ax + b so that 
substitution back into the original equation (15.12) gives b = f(a). Thus we 
have a general solution: 


y=aar+ f(a), 
where a is an arbitrary constant. On the other hand, if we set 
fy) +2=0, (15.13) 


then eliminating y’ between (15.13) and the original equation gives us a so- 
lution with no arbitrary constant, which is known as a singular solution. 
There are various other types of singular solutions, one of which is given below. 


Examples Suppose the Clairaut equation to be of the form 
y= ay’ + (y’)? 
and differentiate both sides to obtain 
y" (a + 2y') =0. 
If we set y’”” = 0, then the general solution reads 
y=ca+e? (15.14) 


with an arbitrary constant c. However, if we choose the possibility that 2y/ + 
x = 0, then we have 
x? + 4y =0. (15.15) 


Remark. Geometrically, the singular solution (15.14) is an envelope of the 
family of integral curves defined by the general solution (15.15), as depicted 
in Fig. 15.2. The dotted parabola is the singular solution and the straight 
lines tangent to the parabola are the general solution. 


15.1 Concepts of Solutions 489 


—20 —10 0 10 20 


Fig. 15.2. The singular solution (15.14) is an envelope of the family of integral 
curves (see Sect. 15.1.6), which are defined by the general solution (15.15) 


15.1.6 Integral Curve and Direction Field 


Before closing this section, we must emphasize the geometric significance of 
a solution of a first-order ODE. In many practical problems, a rough geomet- 
rical approximation to a solution may be all that is needed rather then an 
evaluation of its explicit functional form. Let 


define a function of x whose derivative y’ exists on an interval l:a<a <b. 
Then y’ gives the direction of the tangent to the curve at each of these points. 
Therefore, finding a solution for 


y =F(a,y), a<u<b (15.16) 


can be reduced to finding a curve on the (2-y)-plane whose slope at each of 
its points is given by (15.16). The relevant terminology is given below. 


@ Integral curve: 
If a curve y = f(x) [or g(x, y) = 0] satisfies a first-order ODE (15.16) on 
an interval J, then the graph of this function is called an integral curve. 


490 15 Ordinary Differential Equations 


Obviously, an integral curve is the graph of a function that is a solution of 
a first-order ODE (15.16). Therefore, even if we cannot find an elementary 
function that is a solution of (15.16), we can draw a small line element at 
any point on the (a-y)-plane for which x is in J to represent the slope of an 
integral curve. If this line is short enough, the curve itself over that length 
resembles the line. These lines are called line elements and an ensemble of 
such lines is called a direction field. 


Exercises 


1. Test whether the relation 
sy? —e ¥—-1=0 (15.17) 
is an implicit solution of the ODE 
(xy? + 2zy — 1) y +y’ =0. (15.18) 
Solution: If we blindly differentiate both sides to yield 
2eyy’ ty +e %y! =0 (Qeyt+e%)y' +y? =0 


and then eliminate e~” from the final result by using (15.17), we 
obtain the ODE (15.18). This implies the possibility that (15.17) 
is an implicit solution of the ODE (15.18). The remaining task is, 
therefore, to determine the interval I on which we can define such 
a function y = h(x) that satisfies the relation (15.17) for every x 
on I. 

As a first step, we write (15.17) as 


1l+e-¥ 


d 


< 
l| 
u 
TT 


ax 


which says that y is defined only for x > 0 since e~¥ is always 
positive. Hence, the interval for which (15.17) may be a solution 
of (15.18) must exclude values of x < 0. 

Next, we depict a graph of equation (15.17) on the (a-y)-plane 
(see Fig. 15.3). From the graph, we see that there are three choices 
for the function y = h(x), each of which gives a one-to-one relation 
between x and y. If we choose the upper branch (y > 0), then we 
can say that “(15.17) is an implicit solution of (15.18) for alla > 
0.” If we choose either of the two lower branches (one is above the 
dashed line and the other is below), then we can say that “(15.17) 
is an implicit solution of (15.18) only for x > xp ~ 2.07.” & 


15.2 Existence Theorem for the First-Order ODE 491 


ae 


Fig. 15.3. The curve of the function (15.17) 


15.2 Existence Theorem for the First-Order ODE 


15.2.1 Picard Method 
In this section, we consider a first-order ODE of the form 
y' (x) = f(x, y(@)), (15.19) 
where f is some continuous function. Our main purpose is to prove that: 
(i) a wide class of equations of the form (15.19) have solutions, and 
(ii) solutions to initial value problems 
y'(x) = f(x, y(@)), y(%o) = Yo 


are unique. Statements (i) and (ii) are supported by the existence theo- 
rem and the uniqueness theorem, respectively, as is demonstrated in the 
subsequent subsections. 

Our proof of the two theorems is based on that we call Picard’s method, 
which gives solutions of an initial value problem 


y'(z) = f(2,y(z)), y(to) = yo, (15.20) 


where f (x, y(x) ) is assumed to be continuous and real-valued in a rectangle: 


492 15 Ordinary Differential Equations 
R: |x—20| <a, |y—yo| <6 (a,6> 0). (15.21) 


The key to Picard’s method is to replace the differential equation in (15.20) 
by the equivalent integral form, 


ya) =yo+ f "F(t y(t) at, (15.22) 


which is an integral equation because the unknown function y(a) ap- 
pears in the integrand. That the integral equation (15.22) is equivalent to 
the original initial value problem can be checked by differentiating (15.22) 
on 2. 


Remark. Note that the initial condition y(v9) = yo is automatically included 
in (15.22). 


We now try to solve (15.22). As a crude approximation to a solution, we take 
the constant function yo(a) = yo, which clearly satisfies the initial condition 


~o(xo) = Yo, 


whereas it does not satisfy (15.22) in general. Nevertheless, if we substitute 
the constant function into f(t, y(t)) of (15.22), we have 


ale) = w+ f TCR POO aE (15.23) 


which is a closer approximation to a solution than yo(x). By continuing the 
process, we have a sequence of functions {y,,(x)}: 


@ Successive approximation: 
Given an integral equation (15.22) with respect to y(a), a set of functions 
defined by 
po(x) = yo, 
ene) = vot f ft Pra()dt (n= 1,2,--) (15.28 
xo 


is called a successive approximation to a solution of (15.22). 


We understand intuitively that taking the limit n — oo yields 


Yn(x) > v(2), 


15.2 Existence Theorem for the First-Order ODE 493 


where v(x) is the exact solution of the integral equation (15.22). The 
convergence property of the sequence {y;(z)} and the equivalence of 
the limit function y(#) to the solution of (15.22) are guaranteed if the 
integrand f(x,y(x)) satisfies several conditions as is demonstrated in 
Sect. 15.2.3. 


In summary, we now know the following: 


@ Picard method: 

The differential equation y/(a) = f(a,y(x)) for a given initial value 
y(ao) = yo can be solved by starting with yo(a) = yo and then computing 
successive approximations (15.24). The process converges to a solution of 
the differential equation, where f(x,y) satisfies several specific conditions 
given in Sect. 15.2.3. 


15.2.2 Properties of Successive Approximations 


We have previously assumed that f(a, y(x)) is continuous in the rectangle R 
defined in (15.21). Hereafter, we further assume that f(x, y(a)) is bounded on 
R, which means the existence of a constant M > 0 such that 


\f(x,y(x))| <M for all (z,y) € R. 


In this case, the successive approximations {y,,(x)} show both the continuity 
and boundedness property stated below. 


@ Continuity of successive approximations: 
Let f(x,y) be continuous and bounded by | f(a, y)| < M in a rectangle 


R: |nx—29| <a, |y—yo| <6 (a,b > 0). 
Then, the successive approximations y,(x) are continuous on the interval 
ee I< : b 
&. |e: "95 (EF NO WE OD) 0H || 
@|| ’ M 


@ Boundedness of successive approximations: 
Under the same conditions as above, the y,,(a) satisfy the inequality 


|en(2) — yol| S$ Mla — xo] 


for all x in I. 


494 15 Ordinary Differential Equations 


Remark. The condition |f(a,y)| < M has an important geometric meaning 
in terms of the direction field. Since y’ = f(x,y), the direction field y’ is 
bounded as |y’| < M, namely, —M < y’ < M for all points in R. Therefore, 
a solution curve y(a) that passes through (29, yo) must lie in the shadowed 
region in Fig. 15.4. 


Proof (of the continuity). From (15.23), we have 


l~i(x) — yo| = 


[so yal < [ iltuo dt < Mx aol, (15.25) 


0 


since yo(t) = yo and |f(x,yo)| < M. Now we tentatively assume that the 
theorem is true for a function y, with n > 1, and then prove inductively 
that it is also true for yn. By hypothesis, all points (t, ~n—1(t)) for t in I are 
located within R. Hence, the function 


F,-1(t) > f(t, ~n—-1(#)) 


exists for t in I, which implies that 


x 


n(x) = Yo +/ F,-1(t)dt 


xo 


exists as a continuous function on J. & 


Proof (of the boundedness). Since by hypothesis 
|Fn—1(t)| = If (¢, Gn-1(t))| <M, 


we have 


Ion (x) — yol S 


[Fata sf \F,_1(t)| dt < Mla — 29]. 
xo x 


0 


Therefore, the boundedness of y,,(#) has been proved by induction. & 


y-yy=M(x-x) 


b 
Xo tar 


Fig. 15.4. Continuity and boundedness of a solution curve y(x) on the interval 
I: |x —29| < c= minfa, + 


15.2 Existence Theorem for the First-Order ODE 495 
15.2.3 Existence Theorem and Lipschitz Condition 


Let f(x,y(x)) be a function defined for (#,y) in the rectangle R in the 
(a-y)-plane. We would like to verify the existence of solutions for the first- 
order ODEs expressed by 


y' (x) = f(x, y(2)) 
by imposing a Lipschitz condition: 
4@ Lipschitz condition: 


We say that f(a, y(x)) satisfies a Lipschitz condition on a region R if 
there exists a constant K > 0 such that 


f(z, y(a)) — F(@, 2(@))| < BK |y(a) — 2(@)| (15.26) 


for all (a, y), (a, z) € R. Here the positive constant K is called the Lips- 
chitz constant. 


Our most important theorem is presented below. 


@ Existence theorem: 
Suppose that 
1. f(x,y) is continuous and real-valued on the rectangle R. 


2. |f(x,y)| < M for all (x,y) in R. 
3. f satisfies a Lipshitz condition with constant K in R. 


Then the initial value problem 
y'(x) = f(x, y(z)), y(2o) = yo (15.27) 
has at least one solution y(z) in the interval 


b 
JE 3 |e = <e= i100 —|]|. 
|v — a| <c=min a a 


Proof Consider the successive approximations {y,,(x)} to a solution of the 
initial value problem (15.27), wherein f(x, y(a)) is assumed to satisfy the 
Lipshitz condition (15.26). We would like to prove that (i) the limit function 


p(x) = lim gn(2) 


n— Co 


exists and (ii) that it is the solution of (15.27). 


496 15 Ordinary Differential Equations 


By definition of y,(x), for n > 1, we obtain 


enti (x) — Pn(x)| < 


i, "LF (teen(t)) — f (te ena(t)) ] at 


0) 


< ) If (t, palt)) — f(t, n1(2))| at 


< Kf lpnlt)— ena(dlat (15.28) 


Set n = 1 in (15.28) and substitute it in the result (15.25) to find 


|z — xo|" 


lyo(x) — gi(x)| < KM aI 


(15.29) 
Set n = 2 in (15.28) and use the result of (15.29) in the last term in (15.28). 
Continuing the process, we have 


|x — xo|” 


lpn (x) = Pn—-1(2)| S Kk" 'M (15.30) 


n! 
Observe that the right-hand side of (15.30) is the nth term of the power series 
for eX!*-*0l multiplied by M/K. This implies that the infinite series 


yo(x) + + [yx (x) — Yr—1(2)| (15.31) 


is absolutely (and thus ordinary) convergent, ensuring the existence of the 
limit function p(x) = limn—oo Yn(x). (See Sect. 3.2 for the convergence prop- 
erties of Cauchy sequences.) 

Next we prove statement (ii) above. Note that the nth partial sum of 
(15.31) is just y, (a) and that the infinite series (15.31) equals the limit func- 
tion v(x). Hence, we have from (15.30) and (15.31) that 


k=n+1 
Co i 7 : 
< SO lee(x) — ve_i(x)| < >> Keay ol 
al k=n+1 : 
ies k 
k-1lyy © M po) 
S Ss K Ms S Kone A 
k=n+1 
where 
(Ke)"*1 


Sama 


15.2 Existence Theorem for the First-Order ODE 497 


Since a, is the nth term of the power series of ek ©, we have limy_.o0 @n = 0. 
Therefore, the series of functions {yn(x)} converges uniformly to y(a) in 
the interval I: x € [x9 —c, xo +c], which means that 


Jim f (0, pa(e)) = f(x, e(2)). (15.32) 


That being so, we can write 


ele) = Tim yn(2) = yo + tim nf " f(t yal?) at 


nNm— Ooo 


=w+ fe lim f (t, en(t )) dt 


«wL—-0O0 


= Yo +f Ff (t, p(t) dt. (15.33) 
xo 
By differentiating on x, we have 


g(x) = f (x, 9(x)), (x0) = yo. 


These ensure that v(x) is a solution of our initial value problem (15.27). @& 


15.2.4 Uniqueness Theorem 


Next we examine the uniqueness of the solution y(x) that we found earlier 
using the Picard approximation method (see Sect. 15.2.1). This is described 
by the theorem below. 


@ Uniqueness theorem: 

Let f(x,y) be continuous and satisfy the Lipschitz condition (15.26) in 
the rectangle R. If y and w are two solutions of the initial value problem 
(15.27) in an interval J containing xo, then y(x) = (a) for all x in TI. 


Proof We assume that both y(#)and w() are solutions of (15.27). For x > xo, 
we have from (15.33) and the Lipschitz condition (15.26) that 


vuay= ver [ "lft, p(t) — Ft we))| dt 
< xf ko(t) — w(t)| at. (15.34) 


This holds in the interval I: x € [x9, xo + 6] for arbitrary small 6 > 0. Since 
\y(x) — w(a)| is continuous in J, it has a maximum at some x on J, which we 
label . Equation (15.34) provides that 


498 15 Ordinary Differential Equations 
p< Ky |x —xo| < Kyo for all x in J, (15.35) 


so we have 
(1— Kd) <0. 


Note that by definition yp > 0. Hence, if Kd < 1, we have uz = 0, which says 
that given any Lipshitz constant K, we can find a sufficiently small 6 such 
that 

max |p(x) — 7(x)| = 0, 


ie., 
\y(x) — v(x)| =0 for x € [xo, x0 + 4). 


Continuing this process yields the conclusion that |y(a) — ¢(x)| = 0 for all x 
in R. The same holds for the case x < x9, completing the proof. & 


15.2.5 Remarks on the Two Theorems 


1. The existence and uniqueness theorems only ensure the existence and 
uniqueness of a solution. They do not tell us whether the solution can or 
cannot be expressed in terms of an elementary function form or help us 
to find the solution. 

2. Arguments for real-valued functions given thus far are straightforwardly 
extended to the case that f is complex-valued. In this case we must admit 
complex-valued solutions and f must be defined for complex z. The set 
of points z satisfying |z — zo| < b becomes a circle with a center zo and 
radius b, so domain R is no longer a rectangle. 

3. The initial value problem 


y'(z) = Vly(x)|, y(0) = 9, 


has two solutions, 


2 4 if x > 0, 
y(z) =0 and y(a) = ee if x < 0 


although f(x,y) = \/|y| is continuous for all y. The Lipshitz condition is 
violated in any region that includes the line y = 0 because for y; = 0 and 
positive y2 we have 


lf(x,y2) — f(a,y1)| yz 1 
yo — yal ~ Yy /¥a (Vy2 > 9) (15.36) 


and this can be made as large as we please by choosing y2 sufficietly small, 
whereas the Lipshitz condition requires that the quotient on the left-hand 
side of (15.36) not exceed a fixed constant M. 


15.2 Existence Theorem for the First-Order ODE 499 
Exercises 


1. Using the Picard method, evaluate the successive approximation to the 
solution of the initial value problem 


y'(z) =1+y(x)?, y(0) =0. 


Solution: Set x9 = 0, yo = 0, f(z, y) =1+y? in (15.24) to find 
that 


en(e) =f {1+ bora bae=o+ [tonal ae 


Hence, we obtain 


foal 
“9? 


e(z)=0+ f Odt= x, vo(a)=a+ f dt =a+ 3 
0 0 


(x) [ es “at aie 54 7 and so on. & 
r)=xUL+ = =czr+4 x x’ ,and so on. 
i ; 3 als Coa 
Remark. The exact solution of the above problem can be deduced by sepa- 
rating variables: 

ae es 


(x) =taneg=2+4+~ + —a2°4 Ge ede (-$<«<%) 
a ae eae DT aa 2 2)" 


The first three terms of y3(a) and of the series above are the same. The 
series converges only for |x| < 7/2; therefore, all that we can expect is that 
our sequence ¥1,2,°-: converges to a function that is the solution of our 
problem for |x| < 7/2. 


2. By applying the Picard method to 
y'(x)=xy(x), y(0) =1, (15.37) 
show that the Picard series {y,(x)} converges absolutely and uniformly. 


Solution: The integral equation corresponding to (15.37) becomes 
x 
y(z) = 1 +f ty(t)dt. 
0 
The iterative equation is written as yo(x) = 0 and 


Pn+i(@) = 1+ q. tyn(t)dt,(n =1,2---). 


500 15 Ordinary Differential Equations 


Thus, we easily find 


(1) i424] x? ae 42 ee\” 
n(x) = — —_ 2 S54 —— ——. é 
y 2 ala kl \ 2 


The nature of the convergence is obvious for all real x, since it is 
a partial sum for the Taylor series of the function y(x) = e® /?. 
This means that yn(x) > p(x) asn— co. he 

3. For the equation given by 


y (x) = 2y(x)/?, (0) = 0, 
check the uniqueness of the solution in connection with the Lipschitz condi- 
tion. 


Solution: This equation has the two solutions y(x) = 0, y(a) = 
16x”, although f(x,y) = 2(y)!/? is continuous for all y. The Lips- 
chitz condition (15.26) is violated in any region that includes the 
line y(a) = 0 because for y; = 0 and y2 £0 we have 


[f(s 42) — fe wl _ vy2 _ 1 
ly2 — ys | yo Ya" 


which diverges for yz — 0, exceeding a fixed constant K. d& 


15.3 Sturm—Liouville Problems 


15.3.1 Sturm—Liouville Equation 


ODEs encountered in physics are often classified as Sturm—Liouville equa- 
tions: 


@ Sturm-—Liouville equation: 
A Sturm-—Liouville equation is a second-order homogeneous linear ODE 
of the form 


[p(a) Z| + g(a)y + Aw(x)y = 0, (15.38) 


where A is a parameter and p, q, w are real-valued continuous functions 
with p(a) > 0 and w(x) > 0. Here w(x) is called a weight function. 


Using the Sturm—Liouville operator L defined by 


eee | < (oc =) +a(2)| (15.39) 


15.3 Sturm—Liouville Problems 501 


we reduce the Sturm-—Liouville equation (15.38) to the abbreviated form 
Ly(x) = —Ay(2). (15.40) 
Examples The Legendre equation 
(1-2?) y” —2ay' +n(n+1ljy=0, n>1, re [-1, J 
is expressed as 
[(1 — x) y'|’ +n(n+l)y =0. 
This is in the Sturm—Liouville form of p = 1 — x”, ¢q = 0, w = 1, and \ = 
n(n +1). 


Relevant terminology is given below. 
@ Sturm-—Liouville system: 
A Sturm-Liouville system consists of a Sturm—Liouville equation 


(15.38) on a finite closed interval a < x < b, together with two separated 
boundary conditions of the form 


y(a) = ay’(a) and y(b) = By'(b) 
with a, @ being real. 
A nontrivial solution of a Sturm—Liouville system is called an eigenfunction 


and the corresponding is called an eigenvalue. The set of all eigenvalues 
of a Sturm—Liouville system is called the spectrum of the system. 


Examples The Sturm-—Liouville system consisting of the ODE 
y +rAy=0 O<aK<r 
with the separated boundary conditions 
y(0) =0, y(r) =0 


has the eigenfunction 


and the eigenvalues 


15.3.2 Conversion into a Sturm—Liouville Equation 


Mathematically, Sturm—Liouville equations represent only a small fraction of 
the second-order differential equations. Nevertheless, any second-order equa- 
tion of the form 


a(x)y" + b(x)y' + e(a)y + Ae(x)y = 0 


502 15 Ordinary Differential Equations 


can be transformed into a Sturm—Liouville equation by multiplying the factor 


é(x) = exp BG OS as| (15.41) 


which yields a Sturm—Liouville form, 
(Gay')’ + Ecy + A€ey = 0, 
with a nonnegative weight function €(x)a(z). 
Examples We show below that the Hermite equation of the form 
y” —2axy’ + 2ay = 0 (15.42) 


can be transformed into a Sturm—Liouville equation. Substituting a(x) = 1 
and b(a) = —2a into (15.41) yields 


2 


&(x) = exp Ei (-2s)ds| = er" 
by which we multiplying both sides of (15.42), to obtain 
—a7 —a 7 —«? —a : —«? 
e” y —2xre “ y' + 2ae y=(e v’) + 2ae” y=0, 
This is the Sturm-Liouville form with 
a2 


pz) =e * , qx) =0, wz) =e", 


and A = 2a. 


15.3.3 Self-adjoint Operators 


We know many facts about Sturm—Liouville problems. Below is an important 
concept regarding the nature of these problems. 


@ Adjoint operator: 
The adjoint of an operator L, denoted by L', is defined by 


b b 2 
/ Fottataotarae={ | rte near} ; (15.43) 


Using inner product notation, we can write the definition of the adjoint 
operator (15.43) as 
(f, Lg) = (9, L'f). 


15.3 Sturm—Liouville Problems 503 
The most important terminology in this section is given below. 
@ Self-adjoint operator: 
An operator L is called self-adjoint (or Hermitian) if 
LSit 
or, in inner product notation, 


(f,Lg9) = (9, Lf)". 


It should be noted that an operator is said to be self-adjoint only if certain 
boundary conditions are met by the functions f and g on which it acts. An 
illusrative example follows: 
Examples Let us derive the required boundary conditions for the linear oper- 
ator 
d2 

~ dx? 
to be self-adjoint over the interval [a,b]. From the definition of self-adjoint 
operators, the operator L should satisfy the relation: 


b b 2 . 
OG e..- yes 
if f a = (/ g te) . (15.44) 
Through integration by parts, the left-hand side gives 
b 2 b «75 b 2 px 
«09 | pw AY df / d* f 
i) f dg? = | #| + l-9 dz as ES g ye dx. (15.45) 
From a comparison as (15.44) and (15.45), it follows that the operator L is 


Hermitian provided that 
dg] _ [ aft)’ 
dx|, |? dx 


a a 


15.3.4 Required Boundary Condition 


In the example in Sect. 15.3.3, we derived the required boundary condition 
for a specific Sturm—Liouville operator to be self-adjoint. For general Sturm— 
Liouville operators, such a required boundary condition is given by the fol- 
lowing theorem. 


504 15 Ordinary Differential Equations 


@ Theorem: 
A Sturm-Liouville operator is self-adjoint on [a, 6] if any two eigenfunc- 
tions y; and y; of (15.38) satisfy the boundary condition 


[pyzyi]? =0. (15.46) 


Proof It follows from the explicit form of the Sturm—Liouville operator L that 


a ae Lf? 
(w,tu)=-— f yi (py), dx — =| Yi qyjda. (15.47) 


The first integral is integrated by parts to give 


W 


1 * 716 1 *\/ / 
Lipvil,+— | (ui) pyjde, 


in which the first term vanishes because we have assumed the boundary con- 
dition (15.46). Integration by parts then yields 


1 fey’ pul —2 [al wee 

w Yi U5 a w - Yi Yj $ 

where the first term is again zero owing to our assumption. As a result, the 
sum of integrals I in (15.47) reads 


(yi, Lyy) = ah { [- wi)’ pl yy - vray; } da (15.48) 


fe {- [ [ys (wyt)' = vay; uh eer 


W 


which completes the proof. & 


15.3.5 Reality of Eigenvalues 


@ Theorem: For a Sturm—Liouville system under the boundary condition 
(15.46), we have: 

(a) All eigenvalues are real. 

(b) Eigenfunctions corresponding to distinct eigenvalues are orthogonal. 


15.3 Sturm—Liouville Problems 505 
Proof of (a). If an eigenfunction y,, belongs to the eigenvalue ,,, then 


An(Yns Yn) = (AnYns Yn) = —(Lyn, Yn) 
= —(Yn; Lyn) _ Avil tis Ue) 
This indicates that A* = A, since (Yn, Yn) > 0. Therefore A, is real for all n. 
& 


Proof of (b). According to the same argument as above, 


Am (Ym; Yn) — (AmYm; Yn) = (Lista) 
= —(Ym; Lyn) = An(Ym; Yn): 


Thus, for Am 4 An, (Ym; Yn) = 0, which means that eigenfunctions corre- 
sponding to distinct eigenvalues are orthogonal. d& 


Remark. If eigenvalues are degenerate, say, Am = An (m#n), an orthogonal 
set of eigenfunctions is constructed using the Gram—Schmidt orthogonal- 
ization method. Namely, we can choose the eigenfunctions to be orthogonal 
to each other with respect to the weight function w such that if (ym, yn) 4 9, 
we replace yp, by Yn = Yn — @Ym where a should be chosen to be (Ym, Yn) = 0. 


Exercises 
1. Show that the Bessel equation given by 
ay” + ay! + (2? —n?)y=0 with n>0 and x € (—co, 00) 
can be expressed in the form of a Sturm—Liouville equation. 


Solution: After the transformation « — kx, we have 


[cy (kax)|' + (-" + M2) y(kx) =0, n>O0, 


where p = x, q = —n?x, w = &, and the parameter \ = k? in 
(15.38). & 
2. The Bernoulli equation is given as a nonlinear equation by 
yf = a(x)y + W(x)y*, (15.50) 


where a(x), b(x) are continuous functions in an interval J and k is an 

arbitrary constant. 

(a) Show that the transformation u = y'~* provides an inhomogeneous 
linear equation for u. 


506 15 Ordinary Differential Equations 


(b) Find a solution for the transformed linear equation for u under the 
initial condition u(xo) = uo. 


Solution: 
(a) The transformed equation becomes u’ = (1 — k)a(x)u + 


(1 — k)b(x). 
(b) The above equation can be reduced to an inhomogeneous 
linear equation of the form 


ul = p(x)u+ q(x), 


where p(x) = (1—k)a(ax), g(a) = (1—k)b(z) are continuous 
functions. Let P(x) be a function whose derivative is p(x) 
such that 


where 2Q is a fixed point in J. Multiplying both sides of 
(15.50) by e?) to, we have the relation 


(ePu)’ = e? (u’ — pu) =e? g. 
Therefore, we obtain a solution such that 
u(x) = ue?) + ere f e?)¢(s)ds, 
xo 


where ug comes from the initial condition. & 
3. The logistic equation is a special type of Bernoulli equation given by 


y’ = ay — by’, (15.51) 


where a, b are constants. Find a solution for the above by imposing the 
initial condition y(xo) = yo. 


Solution: Using a solution for Exercise 2(b) by setting k = 2, 
a 


we have 
y(x) = Re (a/yo = be a(e—20) 


Note that y(z)=a/basx— ow. & 
4. The Riccati equation is a nonlinear equation given by 


y' + p(ax)y + q(a)y? = r(2). (15.52) 


(a) Assuming u(x) to be a particular solution of the above, namely, a 
solution when we set r(x) = 0, show that z(a) defined by y(x) = 
u(x) + z(x) constitutes the Bernoulli equation. 

(b) Show that the Riccati equation is reduced to a linear equation of the 
second order by the transformation y = Qu’ /v. 


15.3 Sturm—Liouville Problems 507 


Solution: 
(a) Substituting y = u+ z into the equation, we have 


/ 


[2! + p(2? + Quz) + gz] + [u’ + put qu? —r] =0. 


The second parenthesis vanishes and we have the Bernoulli 
equation such that 2’ + (2up + q)z+ pz? =0. 
(b) The first order derivative gives 


yl! y2 vy! 
y¥=Q rn Neca ere 
U 


Thus, we have 


Mu 12 / 


Q--+ 9Q=1)0-, + (Q' +qQ)— +r =0. 
U U 


Setting Q = 1/p(x), we have uv” + (4 - #) v+prvu=0. & 


16 


System of Ordinary Differential Equations 


Abstract In this chapter we focus on an autonomous system (Sect. 16.3), which 
is a specific type of system of ordinary differential equations. Autonomous systems 
can be used to describe the dynamics of the physical objects that are encountered 
in physics and engineering problems, wherein the laws governing the motion of the 
objects are time-independent, namely, they hold true at all times. The stability of 
these dynamical systems is characterized by the critical point (Sect. 16.3.3), whose 
nature is revealed by the functional form of the autonomous systems. 


16.1 Systems of ODEs 


16.1.1 Systems of the First-Order ODEs 


This section deals with n coupled ordinary differential equations (ODEs). The 
formal definition is stated below. 


@ Systems of ODEs: 
A system of ODEs is given by 


Fy E yrs yr Yr se yr Ds yo, Yo’ yo" y+ yo); .- 


which involves a set of unknown functions y;(«), y2(),--- and their deriva- 
tives with respect to a single independent variable z. 


For each ith equation of (16.1), we denote the highest order of the derivatives 
of y; by ri;. Hereafter, we consider the case of rj; = 1 for all 7 and Jj, ie., a 
system of n ordinary differential equations (ODEs) of the first order expressed 
by 


510 16 System of Ordinary Differential Equations 


y (2) = filz, y1, ye, ° ety Yn), 
Yo (2) = fa(@, y1, Ye, ° a Yn)s 


yj, (x) = fo(@, 41, Y2,°°° Yn). (16.2) 
Here, {f,}, & = 1,2,--- ,n are single-valued continuous functions in a certain 
domain of their arguments and {y,}, k = 1,2,--- ,m are unknown complex 


functions of a real variable z. 


16.1.2 Column-Vector Notation 


For convenience we use column-vector notation for an ordered set of un- 
known functions {y,(a)} in which each y;,(a) is called a component, which we 
denote by a bold-face letter: 


y(x) = [yi(z), yo(x),--- an(x)]" , (16.3) 


where the norm of the vector is defined by 


x(x) I] = (Iya? + bya? + +> + fol?) 7? (16.4) 
Using vector notation, we can express (16.2) in the concise form 
y'(x) = f(x, y(x)), (16.5) 
where the column vector f is defined by its components 
F(x, y(e)) = [fis far fal - (16.6) 


If there exists a set of functions p(x) = (y1(2), Yo(x),--+ ,Yn(x)) satisfying 


yi(x)’ = fi (x, pi(x), Ya(z), td Yn(Z)) , 7=1,2,---,n, 


we say y() is a solution of (16.2). The initial value problem consists of finding 
a solution y(x) of (16.5) in I satisfying the initial condition y(x%o) = yo = 
(y10, Y20,°** + Yno)- 


16.1.3 Reducing the Order of ODEs 
Let consider an nth order ODE of u(x) given by 


d” u(x) 


dx” 


d”—lu(z) 
dx—! 


+ pila) +++ pa(a)u(x) = q(x). (16.7) 


We show that equation (16.7) can always be reduced to a system of n first- 
order differential equations, which is stated as follows: 


16.1 Systems of ODEs 511 


@ Theorem: 
Given an nth-order ODE, it can always be reduced to a system of n 
first-order ODEs. 


Proof We take u(x) and its derivatives u’,u”’,--- ,u(’-)) as new unknown 
functions defined by 
d*-lu(z) 
yk(t) = pa k=1,2,---,n. (16.8) 


It is evident that (16.7) is equivalent to the following set of equations: 


Yi = Y2, Y2=Y3. > Yn-1 = Yn (16.9) 
and 
Equations (16.9) and (16.10) can be written in a brief vector form as 
dy(x 
MD) = Fey), (16.11) 
x 


where the column vectors are defined as 
Y = (Yi, Y2,°"* Yn) 
and 
f=f(x,y) 
= [yo. ¥3s°°+ s Yas —PLYn — P2Ym—1 — ++ — Pay tq). & 


Example One of the most famous systems of the type (16.11) results from the 
equation of motion for a particle of mass m. For a mobile particle along the 
x-axis, the equation of motion is 


qos (« a(t), oe) ; (16.12) 


dt? 


where t is the time and F represents the force acting on the particle. To see 
how the second-order ODE (16.12) can be viewed as a system of the form 
(16.11), we make the following substitutions: 
dx 
dt 


Then (16.12) is equivalent to a system of two equations: 


t Xr, @ Y1; Y2- 
vA = Y2; 
; 1 
yo = —F (2,91, 42), 
m 


which is of the form of y’(x) = f(x,y). 


512 16 System of Ordinary Differential Equations 
16.1.4 Lipschitz Condition in Vector Spaces 


The vector equation 


y'(x) = f(a,y) (16.13) 
is obviously analogous to the scalar equation 


y (x) = f(a,y). 


This analogy implies the possibility that the definition of a Lipschitz condi- 
tion can be extended to the vector equation. The extended Lipshitz condition 
provides a simple sufficient condition for the uniqueness and existence of so- 
lutions, which implies that all the theorems for the scalar equation can be 
generalized so as to hold for the vector equation. 


@ Lipschitz condition for a vector function: 
A vector function f(x,y) in (16.13) is said to satisfy the Lipschitz 
condition on a region R if and only if 


If (x, y(x)) — f (@, 2(@))| SK |y(x) — 2(2) 
(R: |x —a2o| <a, |y—yol <b, |2z— Zo| < 8). (16.14) 


for the Lipschitz constant K. 


When f(z, y) satisfies the Lipschitz condition noted above, we see from (16.14) 
that 


[fr (ax, Y1,Y2,°°° ,Yn) _ fr(2, 41, %2,°°" 5 Zn)| 
< K (ly. — 21] + ly2 — 22) + +++ + yn — 2n|) (k= 1,2,--- ,n).(16.15) 
Using this, we can prove the theorem of the existence and uniqueness of so- 
lutions for the general vector equation (16.13). For instance, the uniqueness 


of the solution for (16.13) is straightforward as shown below. The right-hand 
side of (16.15) yields 


KY Ine ~ a(2 aisKy f Lfe(ce, u(@)) — f(a, 2(2))| de 


< nk? [ ee) ) — zp (x)|dx, (16.16) 


0 k=1 


which holds for the interval I; x € [%o, 29 + 6] for any small 6. Since the left- 
hand side of (16.16) is continuous on J, it has a maximum at some x, which 
we label yu. Then, the inequality (16.16) becomes 


ws nKkp(x— 2x9) < nk pd, 


16.2 Linear System of ODEs 513 


which gives us p(1 — nO) < 0. For any small 6 > 0, we have = 0, which 
indicates that S> |yx —z%| = 0. The same holds true for the case x < xo. Thus, 
the solution of (16.13) is unique. 


Exercises 
1. Consider a initial value problem given by 
y =F(,y), Yy(o) = Yo, 


defined on R: |x — xo| < a, |y — Yo| < 6, (a,b > 0). Assuming that f is 


continuous on R, a sequence of successive approximations Yop, 1,°-° is 
given by 

Po(x) = Yo 
and 


x 
erie) =wot f Fle en(t))at for n=1,2,-— 
zx 
Using this procedure, find a sequence of successive approximations for 


(¥1,¥2) = (y2,-y1), for y(0) = (0,1). 


Solution: Here f(x,y) = (y2,—y1), so we have 
Po(2) = (0, 1), 
ile) = (0,1) + f (1,0)ae = (2,1), 
0 


Saale [a.-bae= (0,1) + («-$) 2 (2.1- =) . 


Continuing with this process, we find the solution of the problem as 
y, (x) > p(x) = (sinz,cosx). & 


16.2 Linear System of ODEs 


16.2.1 Basic Terminology 


We now focus on a particular class of systems of ODEs called a linear system 
of first-order ODEs, described by 


d 
ants) 2 ag (a = u(2), 


514 16 System of Ordinary Differential Equations 


dy(x) 
ar S a2; (a)y; (x) = q(x), 
j=l 
dyn(x) 
die. i Anj(X)Yj(@) = dn(2) 
j=l 
Here agj(x) and gx(x) with j,k = 1,2,---,n are continuous functions 


on x on some interval J. For convenience we use the vector representation 
given by 


TW) — A(ay(x) = ale), (16.17) 


where A = [a,j] is an nm x m matrix. Therefore, Ay stands for the matrix 
A applied to the column vector y = [y1,y2,--* , Yn], namely, the linear 
transform of y by A. The vector q is defined as q = [q1, 92,-°* 5 dn|'. Given 
any y(2o) for xg in J, there exists a unique solution y(x) on I such that 


(xo) = [yr (x0), yo(@o),°** + Yn(2o)]"- 
The use of the linear operator L to (16.17) yields 


where the L is defined as 


If q(x) = 0 for all « on J, (16.17) is said to be a linear homogeneous 
system of nth order, expressed by 


dy(x) 
dx 


— A(x)y(x) = 0. (16.19) 


Otherwise, (16.17) is called inhomogeneous. A homogeneous system ob- 
tained from the inhomogeneous system (16.17) by setting g(x) = 0 is called 
the reduced or complementary system. 


Remark. Note that every linear homogeneous system always has a trivial 
solution v(x) = 0, as can be immediately checked. From the uniqueness of 
the solution, therefore, there is no solution vanishing at only some point of «x. 


16.2.2 Vector Space of Solutions 


Let y,(x) (¢ = 1,2,---) be solutions for an n-dimensional linear homogeneous 
system 

y'(x) = A(x)y(2). (16.20) 
Referring to the axioms given in Sect. 4.2.1, it readily follows that the solutions 
{y;(x)} form a vector space V. Indeed, if y,(x) and y.(x) are solutions of 


(16.20), then cy, (x)+c22(x) with arbitrary constants cj, c2 is also a solution 
of (16.20), and so on. 


16.2 Linear System of ODEs 515 


Now we pose a question as to the dimension of the vector space V men- 
tioned above. We have the answer in the following theorem: 


@ Theorem: 
Solutions of the system (16.20) on an interval J form an n-dimensional 
vector space if the n x n matrix A(a) is continuous on I. 


Proof The continuity of A(a) implies that all its components do not diverge. 
This allows us to set a constant K, 


K= max 5 > Jax; (x)|, 
i=l 


and it then follows that the vector f defined by f(x) = A(x)y(z2) satisfies the 
Lipschitz condition: 


lf(z,y) —f(a,z)|<Kly—2| forvel. 


From the existence and uniqueness theorems we know that there are n solu- 
tions y;(a) of (16.20) such that each solution exists on the entire interval I 
and satisfies the initial condition 


~; (xo) = e: (i =1,2,--- ,n) for x € I, (16.21) 


where the e/s are n linearly independent vectors. 
We tentatively assume that the solutions y, are linearly dependent on I. 
Then there exist constants c;, not all zero, such that 


n 


S > civ;(a) =0 for every x on I. 


i=l 


In particular, setting x = vo, and using the initial condition (16.21), we have 


n 
; ce; = 0, 
i=1 


which contradicts the assumed linear independence of e;. Hence, we conclude 
that the solutions y, are linearly independent on I. 

Next we prove the completeness of {y,(x)}; i-e., that every solution w(x) of 
(16.20) can be expanded as a linear combination of y;(a) satisfying the initial 
condition (16.21). Since the e; are linearly independent in the n-dimensional 
Euclidean space E,,, they form a basis for E,, and there exist unique constants 
b; such that the constant vector w(x) can be expressed as 


(x0) = > | bier. (16.22) 


516 16 System of Ordinary Differential Equations 


Consider the vector = 
y(r) = >_ biy,(2), 
i=1 


where the b; are identical to those in (16.22). Clearly p(x) is a solution of 
(16.20) on I. In addition, the initial value of ~ reads 


y(xo) = bie, 
i=l 


so that y(ao) = (xo). In view of the uniqueness theorem, we have 
v(x) = w(x) for every x on I. 


This leads to the conclusion that every solution w(x) of an nth-order linear 
homogeneous system (16.20) is expressed by the unique linear combination 


w(x) = s biyp,;(x) for every x on J, 
i=l 


where the b; are uniquely determined once we have w(x). As a result, n 
solutions »,(a) of the system (16.20) form the basis for an n-dimensional 
vector space. & 


16.2.3 Fundamental Systems of Solutions 


Again let y;(x) = [y1i(x),--- ,Yni(x)]* (¢ = 1,2,--- ,n) be solutions of the 
linear homogeneous system (16.20) such that 


yp; (x)! = A(x)p;(x) for alli =1,2,--- ,n. 


Note here that {y;(a)} may or may not be linearly independent, since no 
initial condition is imposed (contrary to the case of (16.21)). Specifically, if 
the set {y;(x)} is endowed with the linear independence property, it is called 
the fundamental system of solutions of (16.20). 


@ Fundamental system of solutions: 

A collection of n solutions {y,;(x)} of an n-dimensional lienar homoge- 
neous system is called a fundamental system of solutions of the system 
if it 1s linearly independent. 


Remark. The significance of a fundamental system of solutions lies in the fact 
that it can describe any solution y(a) of the corresponding linear homoge- 
neous system. Consequently, the problem of finding a solution ~(a) becomes 
equivalent to that of finding n linearly independent solutions. 


16.2 Linear System of ODEs 517 


With this terminology, the theorem presented in Sect. 16.2.2 leads to the 
following result: 


@ Theorem: 
A fundamental system of solutions exists for an arbitrary linear homo- 
geneous system. 


Example The second-order equation 
y(t) + y(t) =0 (16.23) 
is equivalent to the two-dimensional linear system 


u(t) = Au(t) (16.24) 


w= [$8] me 4-(%14) 


The fundamental system of solutions of (16.24) is given by 


with 


v(t) = [cost,—sint]’ and ,(t) = [sint, cos¢]", 


whose linear independence follows from the fact that cy; sint + cocost = 0 
implies c; = cg = 0. Furthermore, y,(0) = (1,0) and ~,(0) = (0,1), so any 
solution y(t) is given by 


p(t) = any, (t) + boya(t) for -w~ <t<o, (16.25) 
where y(0) = (ao, bo). 


Remark. The solution y(t) in (16.25) corresponds to the solution of the 
second-order ODE (16.23) satisfying the initial conditions: y(0) = ao and 
y'(0) = bo. 


16.2.4 Wronskian for a System of ODEs 


The theorems given in Sect. 16.2.2 and 16.2.3 ensure the existence of a fun- 
damental system of solutions for any linear homogeneous system of the form 


y'(x) = A(z)y(2). (16.26) 
However, it provides no information as to whether a certain set of solutions 
is a fundamental system or not. In what follows, we consider the criteria 
concerning this issue. Following are preliminary concepts that we need in 
order to proceed. 


518 16 System of Ordinary Differential Equations 


@ Wronsky determinant: 
Let {y,(x)} (k = 1,2,--- ,n) be solutions of (16.26), where y;,(x) = 
[yin (x),-+* ,Ynk(x)]?. Then the scalar function 


[ eu WANS 2 OO Pin | 
P21 22 °** Pan 


W(x)=det| (16.27) 


Oni Pn2°** Pnn 


is called the Wronsky determinant (or the Wronskian) of the solutions 


{Pn(@)}- 


If {y,,(x)} is a fundamental system of solutions of (16.26), then the matrix 
corresponding to W (zr) is called a fundamental matrix. Hence, a fundamen- 
tal matrix is a matrix whose columns form a fundamental system of solutions 


of (16.26). 


Example For the two-dimensional system given in Sect. 16.2.3, the matrix 


P(t) = eet oer , ~—wo<t<oo 
—sint cost 


is a fundamental matrix and W(t) = 1 for all t. 


16.2.5 Liouville Formula for a Wronskian 


The following theorem shows that given any n solutions of (16.26) and any to 
in (r1, 72), we can completely determine the corresponding Wronskian without 
computing the n x n determinant. 


@ Liouville formula: 
Let {y,(x)} (k = 1,2,--- ,n) be any n solution of (16.26) and let ao be 
in (1,72). Then the Wronskian of {y;,(a)} for a € (71,12) is given by 


x 


Wi Wianiexp | / trA(s)ds| 


(0) 


See Exercise 2 for the proof. Since exp ies trA(s)ds| is never zero, the 


theorem implies that the Wronskian of any collection of n solutions of (16.26) 


16.2 Linear System of ODEs 519 


is identically zero or never zero on (11,172). The latter case characterizes a 
fundamental system, as shown by the following theorem: 


@ Theorem: 

A necessary and sufficient condition for {y~,(x)} (K = 1,2,---,n) to 
be a fundamental system of solutions of (16.26) is that W(x) 4 0 for 
Ty <2 < 12. 


Proof Let {y;,(x)} (kK = 1,2,--- ,n) be a fundamental system of solutions of 
(16.26) and let (x) be any nontrivial solution. Then there exist c1,+-+ ,¢n 
not all zero such that p(x) = >>, cip;(x), and by the uniqueness of the 
solutions the ¢; are unique. If ¢ = [c1,--+ ,¢n]? and ®(z) is the fundamental 
matrix of {y;,(x)}, then the previous relation can be written as 


For any x in (11,72), this is a system of n linear equations in the unknowns 
C1,°** ,€n- Since this has a unique solution in c, det® cannot be zero, i.e., 


det®(xz) = W(x) £0 for any x € (171,12). 


Conversely, W(x) # 0 for ry < x < re, implies that the columns 
~1(x),--+ ,Y,(x) of @(x) are linearly independent for r; < x < rp. Since 
they are solutions of (16.26), they form a fundamental system ofsolutions. d& 


16.2.6 Wronskian for an nth-Order Linear ODE 


The previous results for systems of ODEs can be applied to an nth-order 
linear equation 


u™ (x) + ay(x)u™-Y) (x) +--+ + an(x)u(x) = 0, (16.28) 


since (16.28) is transformed into a vector form as 


y’ = Ay, (16.29) 
where 
u 0 1 0 0 
u 0 0 1 0 
y= vee and A= 
0 0 1 


520 16 System of Ordinary Differential Equations 


Relevant terminology and theorems are given below. 


@ Fundamental system of solutions: 
A collection 
ete ees Epl@)s YS 2< 179 


of solutions of (16.28) is called a fundamental system of solutions of 
(16.28) if it is linearly independent. 


@ Theorem: 
A fundamental system of solutions of equation (16.28) exists. 


Proof We know that a fundamental system of solutions of (16.29) exists, and 


we express it by 9y(2),--- ,@_(z), where gq(e) = [vielt),--- sPna(@)I™. 
Furthermore, we may assume that given x in (11,72), 


%;,(2o) = [0,--- ,0,1,0,--- 0)" =e, k =1,2,--- Nn, 


where the single nonzero component 1 in e; is assigned to the kth place in the 
square brackets. By the correspondence of solutions of (16.28) and (16.29), we 
have 


gale) = [Eela). (a). 6 @)] 


for some solution u(x) = &j (a) of (16.28). The collection &(),--- ,&n(x) 
comprises distinct nontrivial solutions, since they satisfy distinct initial con- 
ditions and & = 0 for r, < x < rg would imply that y,(2) = 0, which is 
impossible. 

Finally, if there existed constants c,,--- ,¢, not all zero such that Sar Ck 
&, (x) = 0 for ry < & < re, then 


S> cee, (x) =0,--- yo cee (a) =0, m<@<7. 
k=l k=1 
This implies that 
S> cepn(2) =0;. Tr 27%, 
k=l 


which contradicts the fact that {y,(2)} is a fundamental system of (16.29). 
& 


We now define the Wronskian of a collection of n solutions of (16.28). 


16.2 Linear System of ODEs 521 


@ Wronsky determinant: 
Given any collection &(),--- ,&:() of solutions of (16.28), then 


& £3 Seen es 
ee 
W(x) = det ia) (16.30) 


er) aves? oer ee} 


is called the Wronsky determinant (or the Wronskian) of the solutions 


{&(x)} (k= Wag 1). 


As before, if &(a),--- ,€(a) make up a fundamental system of (16.28), then 
the matrix corresponding to W() is called a fundamental matrix. In 
any case, note that the columns of the matrix corresponding to W(x) are 
n solutions of the system (16.29). We may therefore immediately state a 
result analogous to the Liouville formula given in Sect. 16.2.4, noting that 
trA(x) = —a1(x): 


@ Theorem: 
The Wronskian W(a) of any collection €(x),--- ,&,(x) of solutions of 
(16.28) satisfies the relation 


x 


W(a) = W(eo)exn |- f 


an(s)as a Pi, SR, OK 
10) 


Finally, we have the result corresponding to the theorem in Sect. 16.2.4, for 
which the proof is virtually the same. 


@ Theorem: 
A necessary and sufficient condition for €\(a),--- ,€,(a) to be a funda- 
mental system of solutions of equation (16.28) is that 


W(a2) 40 forr, <4<ro. 


Example Assume a second-order equation 


y! («) + a(a)y(x) = 0. 


522 16 System of Ordinary Differential Equations 


For any two solutions €;(x) and €2(x), we have 
£1(z) oS) 
W(a) = det = const. 
=a (2) do 
The constant is nonzero if and only if €; and €9 are linearly independent. 
Remark. The fact that linear independence implies a nonvanishing Wronskian 
is a property of solutions of linear equations; i.e., it does not hold for nonlinear 


equations. To see this, we consider the functions €;(a) = x? and £9(x) = |a|°. 
They are linearly independent on —co < x < ~w, but 


This results from the fact that €;(a) and (a) cannot both be solutions near 
x = 0 of a second-order linear equation. In fact, they both satisfy €(0) = 
€'(0) = 0 yet are distinct, which violates uniqueness. 


16.2.7 Particular Solution of an Inhomogeneous System 


We close this section by discussing an inhomogeneous linear equation 


dy(x) 
dx 


— A(x)y(x) = q(z). (16.31) 


Let g(x) be continuous on x on some interval I and let {y,} (kK = 1,2,--+ ,n) 
be a fundamental system of solutions for the reduced equation of (16.31). A 
general solution of (16.31) can be written as the sum 


W(x) = Pp(@) + crpi(@) + +++ + Pn (2), (16.32) 


where ¢,,(x) is a particular solution of (16.31) with no adjustable parame- 
ter. 

A particular solution can be obtained from a fundamental system {y;} 
(k = 1,2,---n) of the reduced equation (16.19) by means of the method of 
variation of constant parameters. We assume a particular solution of the 
form 


vp (2) = Cr(e)e4(2) +--+ Cala) en (2), (16.33) 


where the coefficients {C,(x)} (k = 1,2,---,n) are not constants, but un- 
known functions of x. Differentiating (16.33) on x and substituting it into 
(16.31), we obtain 


3 


16.2 Linear System of ODEs 523 


Since {y;,} (k = 1,2,--+,n) are solutions of the reduced equation (16.19), 
equation (16.34) yields 


So Gx (#)Cy (x) = a(x). (16.35) 
k=1 


If we express ;,(x) by its components as 


Pu(L) = [Prr(®), Pr2(),-+* Pan()], 
equation (16.35) becomes 


s prj (x)C;(x) = ae(2), (16.36) 
j=l 
or equivalently, 
Yu 12 *** Pin Ci (2) q, (2) 
Ya. 22 *** Pon C3(x) qo (2) 
oe : = . (16.37) 
Pni Pn2 °*° Pnn Cl (x) q, (2) 


The matrix [y,;] on the left-hand side of (16.37) satisfies det[(p;,;] 4 0 be- 
cause of the linear independence of the fundamental system of solutions {;}. 
Hence, multiplying the inverse matrix (see Sect. 18.1.7) of [xj] by the both 
sides of (16.37), we have 

Cy,(2) = pez), (16.38) 
where {pz(x)}, k = 1,2,--+,n are continuous functions obtained from 
(16.37). Thus once the differential equation (16.38) is solved with respect 
to C;,(x), the solutions determine a particular solution of the form 


p(t) = D> Cole)p u(x). 
k=1 


Exercises 


1. Suppose y1(2), y2(x) to be two solutions of the ODE y” + ayy’ + a2y = 0 
on an interval J containing a point xo. Show that 


W(y1, %2)(2) = e™@-*) W (01, 2) (20). 


Solution: We have y;" + a191' + a2Y1 = 0 and go” +412! + a22 = 
0. Multiplying the first equation by —y2, and the second by y, and 
adding we obtain 


(pipe2” — 41" p2) + a1 (Y1%2' — Y1/Y2) = 0. 


524 


16 System of Ordinary Differential Equations 
Note that W = yige’ — vie and W'! = yiy¢e” — v1" yo. Thus W 
satisfies the first-order equation: 

W'+a,W =0, 


which implies W(a) = ce~™'” in which c is some constant. Setting 
XZ = Xo, we have c = e~™“9W (ao), and thus 


W(x) = e7% 0) WV (49). he 


2. Assume an n-dimensional linear homogeneous system y’(x) = A(x)y(x) 


on I = (a,b), and let {g;(a)} be any n solution. Show that the Wronskian 
of {g;(x)} is given by 


x 


WG = Wee exp | | trA(s)ds| i lereeeneh: 30639) 


(0) 


which is called the Liouville formula. 


Solution: We show that W(z) satisfies the differential equation W’(x) = 
trA(z)W (a) from which the conclusion (16.39) follows. The expansion 
by cofactors of W (a) yields 


W(x) = >> pij(2) Aig(a), (16.40) 


where y;;(x) is the jth element of y,(x) and A;;(x) is the cofactor 
of W(a) (see Sect. 18.1.7 for the definition of the cofactor). Note that 
A,;(z) does not contain the term y;;(x). Hence, if W(x) given in 
(16.40) is regarded as a function of the y,;(x), we have OW/09;; = 
A,;(z) and, by the chain rule, 


W(a)' = > Fp) = = Yo vuley'aule : (16.41) 
We define ae ag = 
yi1(@) +++ ++ Yin(2) 
Wile) = det | pinley «+ vale) | 
ena) ++ Pan) 


where all the elements in the ith row are differentiated. Then, the 
expression in the square brackets in (16.41) is the expansion of W;() 
by cofactors, so that 


W (2) = 5° Wi (a). (16.42) 


16.3 Autonomous Systems of ODEs 525 


Furthermore, since yj;;(a)! = )>p_1 Giz (x) pxj(x), we have 


ate... eaedes vin (we) 
Wi (x) = det | 7pu1 ainyer(x) «++ + ea Uk Pkn (2) 
Bei <usesks Onn (x) 


Multiply the kth row (& 4 #) of the left matrix by —a,,,(a) and then 
add it to the 7th row. This process does not change the value of the 
determinant W;(x), but gives the relation 


pu(2) ad dusve Pin(2) 
Wi (a) = det Qin pi (2) Site ec Git Pin (2) => ais (x)W (a). (16.43) 
Pni(x) savas ests Pnn (x) 


From (16.42) and (16.43), we arrive at the desired result. & 


16.3 Autonomous Systems of ODEs 


16.3.1 Autonomous System 


We noted earlier that an nth-order ODE reduces to the first-order form: 


; yi (x) Fy (5 91, y2,-** 5 Yn) 
y(ay == |) 2 | BG mw) | rey), (16-44) 
Yn(x) Fra(@3 91, Yar*** Yn) 


where y(x) and F(z, y) are n column vectors. Particularly important in many 
applications is the case where F(x, y) does not depend explicitly on x. Rele- 
vant terminology is given below. 


@ Autonomous system of ODEs: 
A system of a first-order ODE of the form 


is called an autonomous system, wherein F' does not depend explicitly 
on the independent variable x. 


526 16 System of Ordinary Differential Equations 


If F does depend explicitly on x, the system is said to be nonautonomous. 


Example Consider a second-order ODE of the form 


Setting yi(z) = u(x) and yo(x) = u'(x), we have an autonomous system such 


as 
10) =F oT = [peli vacey | =FO 
16.3.2 Trajectory 


As a prototype of autonomous systems of ODEs, we consider a two-dimensional 
system such that 


ee el ee 


where y(t), y2(t) are unknown functions on t in some interval J. We assume 
that fi(yi,y2) and fo(y1,y2) are defined in some domain D and satisfy the 
Lipschitz condition on both y(t) and yo(t). If to is any real number and 
(y10, Yoo) € D for any yio0 = y1(to) and yoo = Yo(tg), the above hypotheses 
guarantee the existence and uniqueness of solutions for (16.45), 


yi(t) = pilt), yo(t) = va(t), 


satisfying the initial conditions 


yi(to) = y10, Y2(to) = ye0- 


We now consider a subdomain R of D in which f1(y1, y2) does not vanish. 
Then, we have in R the relation 


dy2 = dy2 dt = dy / dt = fo(y1, y2) 


dy dt dy, dyi/dt — filyr,y2)’ 


which represents a direction field in (y1-y2)-plane as noted in Sect. 15.1.6. 
From the uniqueness theorem, there exists a unique integral curve of (16.46) 
in R satisfying the initial conditions. Such an integral curve on (y1-y2)-plane 
is called a trajectory of (16.45). 


(16.46) 


@ Theorem: At most one trajectory passes through any point. 


Proof This is obvious from the uniqueness of solutions. If not, two or more 
trajectories emerge from the crossing point chosen as an initial value 
point. & 


16.3 Autonomous Systems of ODEs 527 


Remark. When the vector field 
= ry = fi(x1, v2) 
as | = eee 


describes the motion of a point in R, the domain R is called a phase space 
of the system (16.45). 


16.3.3 Critical Point 


Suppose that the autonomous system (16.45) has a time-independent solution 
expressed by 

p(t) =ceD, 
where c = (c1, C2) is a constant vector. Then, no trajectory can pass through 
the point c (see the theorem in Sect. 16.3.3). In addition, we obviously have 


y'(t) =0= flo). 


Conversely, if there exists a point cin R for which f(c) = 0, then the functions 
y(t) = c are solutions of (16.45). The point c is said to be a critical point 
(or singular point or point of equilibrium). 


@ Critical point: 
Assume an autonomous system 


y' (x) = F(y) for ye D. (16.47) 
Then, any point c € D that gives 
F(c) =0 


is called a critical point of (16.47). Any other point in D is called a 
regular point. 


16.3.4 Stability of a Critical Point 


Let us discuss the stability of a critical point of an autonomous system 
(16.45) by analyzing trajectories of its solutions around the critical point. 

We assume throughout that the function F'(y) is differentiable of the first 
order on D, which guarantees the existence and uniqueness of solutions of 
the initial value problem (16.45). Then, the solutions of (16.45) can be con- 
veniently pictured as curves in the phase space. 


528 16 System of Ordinary Differential Equations 


Now we consider a solution ¢(«) of (16.45) that passes through the point 
7 for xo, where the distance between 7 and c is small. Let us now follow 
the trajectory that starts at a point 7 different from y,, but near c. If the 
resulting motion w# remains close to the critical point c for x > xo, then the 
critical point is said to be stable, but if the solution 7 tends to return to 
the critical point c as x increases to infinity, then the critical point is said 
to be asymptotically stable. Finally, if the solution = leaves every small 
neighborhood of c, the critical point is said to be unstable. More precisely, 
we have the following definitions: 


@ Stability of a critical point: 
Let c be a critical point of the autonomous system y’(x) = Fy), so 
that F'(c) = 0. The critical point c is called: 


(i) stable when given a positive e, there exists a 6 so small that 
ly(0) —e| <6 => |y(x)—e| <e foralla>0; 
(ii) asymptotically stable when for some 0, 
Iy(0) —e| <6 + lim |y(z) ~e| =0: 


(iii) strictly stable when it is stable and asymptotically stable; 
(iv) neutrally stable when it is stable but not asymptotically stable; and 
(v) unstable when it is not stable. 


16.3.5 Linear Autonomous System 


An autonomous system y’ = Fy) is called linear if and only if all the 
elements F; of F are linear homogeneous functions of the yz, so that 


i Qaiyit-++Ainyn (¢=1,--- ,N). 

dx 
Hence, a linear autonomous system is just a (homogeneous) linear system of 
ODEs with constant coefficients. The analyses for linear systems are generally 
useful since we can always replace F;(y) by the linear terms of their Taylor 
expansions about a point y = yp for analyzing their local behavior. 

We now discuss in detail the case n = 2 of linear plane autonomous systems 

of the form y’ = Ay. Any such system is expressed by 


d. d 
; = ax + by, + =cx+ dy, (16.48) 


16.3 Autonomous Systems of ODEs 529 


a= (Ci) 


with a,b, c,d being constants. Observe that the simultaneous linear equations 


where x = y1, y = yo, and 


ax + by = cr+dy = 0 


have no solution except 
r=y=0 
unless detA = 0. We thus see that the origin is the only critical point of the 


system (16.48) unless ad = be. 
Relevant terminology is given below. 


@ Secular equation: 
If (a(t), y(t)) is a solution of (16.48), then x(t) and y(t) satisfy the equa- 
tion: 


u” —(a+d)u' + (ad — bc)u = 0. (16.49) 


This equation is called the secular equation of the autonomous system 
(16.48). 


Proof The first equation of (16.48) says that 
by = a’ — az, 


which implies that 
x" — aa! = by’. 
We thus have 
x” — ax! = W(cx + dy) = ber + d(a’ — az), 
or equivalently, 
x” —(a+d)a’ + (ad — bc)x = 0. 
The proof for y(t) is the same, replacing a with d and bwithc. & 
The secular equation (16.49) has an important property associated with the 


nature of the critical point. This is seen by introducing the concept of the 
characteristic polynomial P of (16.49) as 

ns VD 22 2 a— Xr b 

P= -—(a+d)\ + (ad — bc) = ae te 


| = det(A — AJ). 


530 16 System of Ordinary Differential Equations 


If A; (j = 1,2) are the roots of P = 0, then there exist nonzero eigenvectors 
(x;,y;) such that 


a) GS) = (corsa) =) 
cd} \y; cx; + dy; TN a 


From this, it follows that the functions 


out a and 2! & 
Y1 Y2 


are a basis of vector-valued solutions of (16.48). We shall see later that the 
nature of a critical point of a system is completely determined by the values 
of the roots A, Ao. 


16.4 Classification of Critical Points 


The behavior of trajectories of a linear autonomous system 


s()-()()=aC) om 


near its critical point depends on the eigenvalues of the matrix A, denoted by 
A, and Ag. There are five cases to consider and we discuss each in turn. 


16.4.1 Improper Node 


We first consider the case where A; and 2 are real, unequal, and of the same 
sign. A critical point for this case is called an improper node. In this case, all 
the trajectories approach the critical point tangentially to the same straight 
line with increasing t. 


Example An example of improper nodes is given by 


s()-(2)G)- 


The eigenvalues are obviously \ = —3, —2 and the corresponding eigenvectors 
are (1,0) and (0,1). The general solution to (16.51) is 


@ af (i) em tce Ge (16.52) 


The trajectories given by (16.52) for several values of cy and c2 are shown in 
Fig. 16.1. 


16.4 Classification of Critical Points 531 


IN 
By 


Fig. 16.1. Trajectories associated with the improper node of the system (16.51) 


—~# 


~q 


—~#t 


a 


— 


-5 + 


Remark. If the eigenvalues are real, unequal, and positive (contrary to the 
above example), then the trajectories are similar to those in Fig. 16.1 except 
that the directions of the arrows are reversed; in other words the trajectories 
recede from the critical point and go off toward infinity. 


16.4.2 Saddle Point 


We next consider the case where A; and 2 are real, unequal, and of the 
opposite sign. In this case, the trajectories approach the critical point along 
one eigenvector direction and recede along the other eigenvector direction. 
The critical point in this case is called a saddle point. 


Example Assume the system 


s()-(2)G) 


The eigenvalues are Ax = —1,2, and the corresponding eigenvectors are (1,0) 
and (1,3), respectively. The general solution to (16.53) is 


(*) =a(j)etta(5)e. (16.54) 


The trajectories given by (16.54) are shown in Fig. 16.2. As (16.54) consists 
of an e~* term and an e”’ term, the trajectories approach the origin along the 
eigenvector direction (1,0) and recede along the direction (1,3) as t increases. 


532 16 System of Ordinary Differential Equations 


Zl 


40 


20 


Fig. 16.2. Trajectories around the saddle point of the system (16.53) 


16.4.3 Proper Node 


We next consider the case of two roots of the characteristic equation being 
real and equal. This type of critical point is called a proper node. 


Example We consider the critical point of the system 


(c)=(8)(0)- ss 


The critical point occurs at the origin, with the degenerate eigenvalue being a. 
Generally when the eigenvalue \ of the characteristic equation is degenerate, 
the eigenvector is given by 

u(t) = (cy + egt)e™, u(t) = (cg + cate. (16.56) 
Hence, we set \ = a in (16.56) and substitute the results into (16.55) to obtain 
c2 = c4 = 0. The solution to (16.55) is thus 

u(t) =cye™, v(t) =c3e™. (16.57) 
Eliminating t from (16.57) yields the expression of the trajectories: 
v= yu if a #0 
Cy 


and 
u=0 if c; = 0, 


both of which are depicted in Fig. 16.3. The trajectories approach or recede 
from the origin, depending on the sign of a. 


16.4 Classification of Critical Points 533 


IS yZ 

Fe NS a ale 
Sten 
AI 


Fig. 16.3. Trajectories around the proper node of the system (16.55) 


16.4.4 Spiral Point 


So far, we have restricted our attention to cases where the two eigenvalues 
are real. Now we consider the case in which the two eigenvalues are complex 
conjugates of each other. The corresponding critical point is called a spiral 
point or a focus. 


Example An example for this case is 


S()-()G). oss 


The critical point is at the origin, and the eigenvalues are A = —1 +7 with 
coresponding eigenvectors (1, 1-7). The general solution to this system is 


U 1 14% 1 —1-i 
(*) er nol tale nae (16.59) 


The result represents a family of curves that spiral into the critical point as 
t increases. Real components of the solutions u(t) and v(t) given by (16.59) 
are plotted in Fig. 16.4. 


16.4.5 Center 


The final class of critical points is called a center, for which the two eigenval- 
ues are pure imaginary. In this case, trajectories consist of a family of closed 
loops centered about the critical point. 


534 16 System of Ordinary Differential Equations 


200 - 


100 + 


—100 + 


—200 + 


Fig. 16.4. Trajectories for the spiral point of the system (16.58) 


Example Consider the system 


s@)-(43)6) om 


The eigenvalues and corresponding eigenvectors are Az = +7 and (1+i, —1), 
and the general solution reads 


ie <k i fe 
(F)=a{ Metta Se a (16.61) 


Figure 16.5 shows several trajectories for different values of c, and cy. All the 
trajectories represent periodic motion about the critical point. 


16.4.6 Limit Cycle 


Before closing this section, we have one more topic to discuss. Consider the 
system 


y Ne q (16.62) 


The only critical point is at the origin. Letting x = rcos@ and y = rsin@, the 
system (16.62) becomes 


=r(l—-r’?) and 6’ =-1. (16.63) 


16.4 Classification of Critical Points 535 


20 


1 


fo) 


Y 


-40 —20 0 20 40 


Fig. 16.5. Periodic trajectories for the center of the system (16.60) 


The system (16.63) has a trivial solution of r = 1, 0 = —t+ const, which 
represents periodic clockwise motion around the unit circle. We can find other 
solutions by solving (16.63). The equation for r is easy to solve, yielding 


1 


Eo 
TO 


where r(0) = ro. Figure 16.6 shows r(t) plotted against t for rg > 1 and 
ro < 1. The trajectories spiral in toward the unit circle as t — oo if r9 > 1 


r(t) = 


oe 
4b 
> O- 
246 
2 1 1 | . L L : ! 
-1 0 1 2 3 
U 


Fig. 16.6. Converging behavior of solutions of the system (16.62) to a limit cycle 


536 16 System of Ordinary Differential Equations 


and they spiral out toward the unit circle as t — oo if ro < 1. Hence, all the 
trajectories spiral into the unit circle as t — oo. 

The unit circle mentioned above is called a limit cycle. Limit cycles are 
important for determining the stability of the system, since the existence of a 
limit cycle ensures the existence of periodic solutions to a system. 


Exercises 


1. Consider the system given by 
x’ =e" +sin5y — cos 2y, 
y =ax+2siny. 


Find the equilibrium point and describe the stability of the system around 
the point. 


Solution: Expanding all functions on the right-hand side around x = 
0, y = 0, we have 


gz =a2+d5y+9(2,y), 
= z+ 2y+ R(z,y). 


The functions g,h converge to zero faster than ,/x? + y?, and the 
characteristic equation becomes 


—-A+1 5 
1 -A+2 


Pa 


whose roots are (3+ V21)/2. Both of these are positive and the system 
is unstable. & 


16.5 Applications in Physics and Engineering 


16.5.1 Van der Pol Generator 


As a physical example of a system in which a limit cycle may occur, we 
consider the following electric circuit consisting of a coil with inductance LD 
and a condenser with capacitance C attached to a tunnel diode . A tunnel 
diode is a nonlinear element in the sense that it exhibits nonlinear current— 
voltage characteristics: 


I(V) = Ip — a(V — Vo) + (V — Vo)? (16.64) 


It follows from Fig. 16.7 that a tunnel diode behaves like an ordinary resistor 
at low and high voltages, but like a negative resistor at intermediate voltages. 


16.5 Applications in Physics and Engineering 537 


10 


V, =2.0, 1, =4.0 


Current / 


0 1 2 3 4 5 
Voltage V 


Fig. 16.7. Left: An electric circuit consisting of a coil with inductance L, a condenser 
with capacitance C’, and a tunnel diode. Right: Plot of the nonlinear current-voltage 
characteristics [(V) of the tunnel diode 


Thus, a tunnel diode is expected to amplify small oscillations in the system, 
provided we choose the parameters in an appropriate manner. 

The equation of motion for the LC circuit attached to a tunnel diode is 
obtained as described below. The law of the conservation current flow 
ensures that 

I, +1(V)+Ic =0, (16.65) 


where 


1 dV 
In=z f Vat and Io = C7. (16.66) 


Substituting (16.64) and (16.66) into (16.65) and then differentiating with 
respect to time, we get 

@V i 9, dV 2 

Ge + Gl-a+ 3V — Vo)*] = +woV =0, 
where we introduce the resonant frequency wo defined by the equation wi = 
1/(LC). For simplicity, we define a new variable 


_V-W 
-—>, 


ax 


for which 


ll 
| 


#—a(1— Bx) + wee =0 (i =) (16.67) 


with 
Og 3CV 2b 
C’ a 


538 16 System of Ordinary Differential Equations 


(a) u=0.3 (b) u=3.0 


6r 


8 


dx(t)/dt 
oO 
T 


x(t) x(t) 


Fig. 16.8. Trajectories for the Van der Pol equation (16.68) with the parameter 
fs = 0.3 for (a) and p = 3.0 for (b) 


Equation (16.67) can be further simplified by replacing x by x/,/@ and intro- 
ducing a new time variable t = wot. Hence, we finally obtain 


#—-pl—2?)t+ar=0 (16.68) 


with the following key parameter: 


The nonlinear differential equation (16.68) is known as the Van der Pol 
equation. As shown below, it describes self-sustaining oscillations in which 
energy is supplied to small oscillations and removed from large oscillations, 
which gives rise to the limit cycle in the phase space. 

We can observe the self-exciting behavior of the system governed by (16.68) 
in the phase space plot in Fig. 16.8, where we set «4 = 0.3 and pw = 3.0 for 
various initial points vp = (a(t = 0), «(t = 0)). We see that all the trajectories 
starting at a point 2p inside (or outside) a closed contour C move outward (or 
inward) as t increases and then converge to the contour C; such a characteristic 
closed contour is known as the limit cycle of the system. The shape of the limit 
cycle depends on the value of js, as is evident from Fig. 16.8. 


17 


Partial Differential Equations 


Abstract Broadly speaking, there are three classes of partial differential equations 
that are relevant to mathematical physics, as reflected in the section titles of this 
chapter. After examining the basic properties common to all the abovementioned 
classes of equations, we devote the balance of this chapter to a discussion of the 
mathematical essence of each class. 


17.1 Basic Properties 


17.1.1 Definitions 


In this section we present the basic theory of partial differential equations 
(PDEs), an understanding of which is crucial is for describing or predicting 
the realm of nature. The formal definition is given below. 


@ Partial differential equations: 
A partial differential equation of order r is a functional equation of the 
form 


2 
Ou Ou Ou O07u ) i (17.1) 


F Te ; ae 5 eee ; iG by ; eee 5 = 5 eee 

( hs ™ Oa’ Oxo’ Oia cone 
which involves at least one rth-order partial derivative of the unknown 
function u = u(x, %2,--- ,£n) of independent variables x1, @2,--- , Up. 

In this chapter we often denote partial derivatives with subscripts such as 


Ou O7u 


Fu = ug = Ozu, dndy = Ugy = 0,0yu, 


540 17 Partial Differential Equations 


We also use the shorthand 


a) O? 
OS og OPI games 
Then, the general form (17.1) of a PDE is expressed as 
F(%,Y,+++ ,U, Uns ** 5 Une, Ury, +) = 9, (17.2) 
where u = u(x,y,---) is the unknown function of independent variables 
x,y,:::. A solution (or integral) of a PDE is a function y(z,y,---) 


satisfying equation (17.2) identically, at least in some region of the indepen- 
dent variables x, y,---. 


17.1.2 Subsidiary Conditions 


The general solution of (17.1) depends on an arbitrary function. This state- 
ment is valid even for higher-order PDEs, indicating that a PDE has in gen- 
eral many solutions. Hence, in order to determine a unique solution, auxiliary 
conditions must be imposed. Such conditions are usually called initial con- 
ditions on time or boundary conditions for positions. 


Initial condition: 

In physics, an unknown function in a PDE usually involves independent 
variables of time t and position xz, y,---. Initial conditions for an unknown 
function are imposed on a particular (initial) time t = to for an unknown 
function and/or its time derivatives. 


Boundary condition: 

Boundary conditions are imposed for an unknown function at the bound- 
ary or the infinity of a domain D in which the PDE is valid and are classified 
into two cases: 

1. Dirichlet condition is the case in which an unknown function uw is 
specified on the boundary of the domain D (often denoted by OD), 
where uw is a function of time t and position x, y,---. 

2. Neumann condition is the case in which the normal derivative of an 
unknown function Ou/On is specified. 


17.1.3 Linear and Homogeneous PDEs 


A PDE is called linear if and only if the F of (17.1) is a linear function of u 
and its derivatives. First we assume a first-order PDE with two independent 
variables x and y, whose general form reads 


17.1 Basic Properties 541 


F(x, y; U, Uz, Uy) = 0. (17.3) 
Then, if it is linear, (17.3) can be expressed by 
A(x, y)Ue + B(x, y)uy + cla, y)u = g(a, y), (17.4) 


where a, b, c, and g are given functions of x, y. Using the operator LD, we 
express (17.4) by a simple form such that 


Lu(a,y) = 9(@,y), (17.5) 
where the operator L is defined by 


L= a(x, y)Oxz + 6(x, y)Oy + e(z, y). 


The linearity of PDEs guarantees that for any function u, v and any constant 
c the relations hold for 


L(u+v)=Lu+Lv, L(cu) = cL(u). 


Examples 
Ura — € *Uyy = 0 (linear) 
Ure —€ *Uyy =sinaz (linear) 
( 


UUg + Uy = 0 nonlinear) 


TU, + yuy +u2=0 (nonlinear) 


A linear equation is said to be homogeneous if the equation contains 


either the dependent variable wu or its derivatives uz, Uy,--:, not an indepen- 
dent variable such as x, y,---. For instance, the PDE (17.5) is homogeneous if 
g(x,y) = 0, 


since the equation 

Lu(2,y) =0 (17.6) 
involves only u, uz, uy and not x or y. On the other hand, if g 4 0 in (17.5), it 
is called an inhomogeneous (or nonhomogeneous) linear equation. These 
statements are generally valid even for higher-order PDEs. 


17.1.4 Characteristic Equation 
We consider a first-order homogeneous linear PDE of the form 
a(x, y)Oxu(x, y) + (a, y)Oyu(a, y) = 0, (17.7) 


which is the most simple (and thus pedagogical) class of PDEs. In general, 
solutions of PDEs are described by arbitrary functions f(p) of a particular 
independent variable p, wherein p = p(x, y) is some combination of independent 
variables x and y. We verify this statement for the case of (17.7). 


542 17 Partial Differential Equations 


By the chain rule on the derivative, we have 


Ou  Opdf Ou _ Opdf 


= = ; 17. 
Ox Oxdp Oy  Oydp ee) 
Hence, the PDE (17.7) can be rewritten in the form 
Op Op| df _ 


This implies that the function form of f(p) may be arbitrary if p = p(x, y) 
satisfies the equation 

3) 0 

a(e,y) + W(a,y) =0. (17.10) 

Ox Oy 
Therefore, an arbitrary function f(p) such that the p satisfies (17.10) serves 
as the solution of the original PDE of (17.7). [The case of 0f/Op = 0 gives a 
trivial solution of f(p) = u(x, y) = const, which we omit below.] 

To obtain the solution p = p(a,y) of the equation (17.10), we tentatively 
suppose that the function p = p(z,y) takes a constant value along a curve 
C: y=y(zx) on the (a-y)-plane. Then, the total derivative of p on the curve 
C should vanish, so that 

Op Op 


dp = —dxz+ 


=0. 17.11 
An ay! 0 (17.11) 


From the correspondence between (17.10) and (17.11), we see that these are 
equivalent provided that 


dy _ U(x,y) 
dx a(x,y) 


, a(az,y) #0. (17.12) 


This is called the characteristic equation of the PDE (17.7) and its solution 
y = y(x) is the characteristic curve of (17.7). From (17.11) and (17.12), 
therefore, we obtain the desired function form of p = p(#,y) that makes an 
arbitrary function f(p) the solution of the original PDE. 


Examples We evaluate a general solution for (17.7) in the case that a, 6 are 
constant and nonzero coefficients. From (17.11) and (17.12), we obtain the 
characteristic curve (line) p = ba — ay. Then a general solution takes the form 


u(x, y) = f(p) = flor — ay), (17.13) 


where f is an arbitrary function. The solution can be easily checked by tak- 
ing derivatives using (17.8) and substituting those into (17.7). A less trivial 
example is given in Exercise 1. 


17.1 Basic Properties 543 
17.1.5 Second-Order PDEs 


The general form of second-order linear PDEs is 


n n 
S > aij(a1,02,°++ 5 )OOjU +S  ai(ai,22,+-+ ,€n) Ou 
i,j=l j=l 

+a9(21,%2,°°° pay tt = Geisha Sa), (17.14) 


where the unknown function u depends on n-independent variables denoted 
by #1, %2,:++ ,&p. Note that a,j = aj; since the mixed derivatives are equal. 
The form of (17.14) represents a very large class of PDEs. Among them, we 
restrict our attention to the case g = 0 with real constant coefficients, namely, 
second-order linear homogeneous PDEs. The general form of linear PDEs of 
second-order involving n independent variables with real constant coefficients 
is written as 


De 045 O;0;U + .S a,Oju + agu = 0. (17.15) 
i,j=l j=l 
The linear transformation of independent variables x = (21, %2,--+ ,Xn) 


to y = (y1, Y2,°** ; Yn) is given by 


y = Ba, (17.16) 
or equivalently, 
n 
Vk = Ss bemtms 
m=1 
where the bg, are elements of the n x n matrix B. Using the chain rule on 


the derivative, we have 
es 
Ox; OYK 


and 


O7u = 
BnOa; - bei — 2) 63 bmj —— Dus 2) (17.17) 


Hence, the first term of (17.15) is converted to 


n 


ys aij O0ju = se (b4iGi70mz) OkOmU, 


j=l k,m=1 


which leads the relation 


aij — bp Qij0m;- 17.18 
J J J 


k,m=1 


544 17 Partial Differential Equations 


Thus we obtain the PDE with new variables y1, y2,--+ , Yn by the transforma- 
tion A — B*AB, where B* is the transpose of B. 

The appropriate choice of the matrix B makes it possible to diagonalize 
A such that 


where ¢1,C2,°-+ ,€n are the real eigenvalues of the matrix A. Thus, any PDE 
of the form (17.15) can be converted into a PDE with diagonal coefficients in 
terms of a linear transformation of a set of independent variables such as 


“Ou “ Ou 
s er y dj; =— = 0. 17.19 
2. Cc Oy? + 2 og: ( ) 


@ Theorem: 
By linear transformation of independent variables, the equation (17.15) 
can be reduced to the canonical form (17.19). 


17.1.6 Classification of Second-Order PDEs 


We can classify the types of PDEs depending on the positive or negative values 
of the coefficients ¢1,c2,+++ ,Cn in (17.19) for the case d; = 0. 


1. Elliptic case: 
If all the eigenvalues c,,c2,--- ,Cy are positive or negative, the PDE is 
called elliptic. A simple example is given by 


Ou Ou 


Baka ae pete th 
Oyt — Oys 


2. Hyperbolic case: 
In this case none of the {c;} : 7 = 1,2,--- ,n vanish and one of them has 
the opposite sign from n — 1 than the others. For example, 


Oru Ou 


Be Se ie aes eel 
Oy = Oys 


3. Parabolic case: 
If one of the {c;}, i =1,2,--- ,n is zero and all the others have the same 
sign, the PDE is parabolic. 


17.1 Basic Properties 545 


Below are basic PDEs in physics classified by the definition given above: 


Laplace equation: A,,u = 0, (17.20) 
Wave function: utt = Anu, (17.21) 


Here A,, means the Laplacian defined by A, = 0? +03 +---+ 02. The other 
important equation takes the form 


uz = Anu, (17.22) 


which we call the diffusion equation. The diffusion equation is different 
from the wave equation, where the time reversal symmetry t — —t holds. All 
of these equations (17.20)—(17.22) are linear since they are first degree in the 
dependent variable u. 


Exercises 


1. Find a general solution of the PDE of u = u(a,y) given by 
Us + 2ry*uUy = 0. (17.23) 


Solution: The characteristic equation of (17.23) reads dy/dx = 
2ry”, which has the solution y = 1/(p — x”). Hence, we have 
p =x? +(1/y), i-e., the general solution is given by 


u(x, y) = f (« + ~) 


In fact, u(x, y) is a constant on the characteristic curve y = 1/(p— 
x”) whatever value p takes, as proved by 


di, : 1 Ue 2x OU OU oa OU 
dx ‘p—a?)) Ox (p—ax?)? Oy Ox Yay 


and similarly we have du/dy=0. & 


2. Classify second-order PDEs in two independent variables whose general 
form is given by 
Ou + 2a120,0yu + a2207u = 0, (17.24) 


where dj, Q22 are real constants. 


Solution: By completing the square, we can write (17.24) as 


(Ox + a120y)?u + (a22 — afg)O5u = 0. (17.25) 


546 17 Partial Differential Equations 


Here, let us introduce the new variables z and w by the linear 
transformation of the form x = z, y = a122 + (a22 — af) !/2w. We 
then have 


@. . ye 6) O | 2 \1/2 0 
Oz ae ey Dw 7 ai2) Oy’ 


so that for the case a22 > a7, (17.25) gives 


022 * Aw? 
This is the elliptic case and is called the Laplace equation in the 
zw-plane. We easily see that for (17.25) the hyperbolic case is ob- 


tained for a2 < a},. Thus, the second term of (17.25) determines 
the types of PDEs. & 


17.2 The Laplacian Operator 


17.2.1 Maximum and Minimum Theorem 


We describe the fundamental properties of three operators, the Laplace, dif- 
fusion, and wave operators. These three operators are of great importance in 
the theory of PDEs. We begin with a description of the Laplace operator 
(or simply Laplacian) A,, on R” defined by 


where n is a positive integer. The Laplacian is not only important in its 
own right, but also forms the spatial component of the diffusion operator 
Lp =G— A, and the wave operator Ly = 0? — A,,, whose properties are 
discussed in Sect. 17.3 and 17.4. 

First, we explain the maximum principle for the Laplace equation 
given by 


A,u(«z) = 0, 
whose solutions are called harmonic functions. Obviously, the one-dimen- 
sional case (n = 1) is trivial, so we consider the case where n > 1. Let 


D be a connected open set and u be an harmonic function in D with sup 
u(a) = A<oo for xe D. 


@ Maximum and minimum theorem: 
The maximum and minimum values of u are achieved on OD, say the 
boundary of D, and nowhere inside. 


17.2 The Laplacian Operator 547 


Before going to the proof, we examine certain properties of the solutions of 
Poisson’s equation expressed by 


A,u(x) = —4rp(a). (17.26) 


@ Lemma: 

If the function p(a) in Poisson’s equation (17.26) is positive (or negative) 
at a point ao, then the solution of (17.26) cannot attain its maximum (or 
minimum) value at the point xo. 


Proof (of the lemma): If the function u(x) satisfying (17.26) at- 
tains a minimum at a point ao, then it should attain a minimum with 
respect to each component 21,2%2,--- ,®, separately at that point. 
Then all the second-order derivatives of u would have to be non- 
negative, which means that the left-hand side of (17.26), ie., the 
sum of the second-order derivatives would have to be nonnegative. 
This result contradicts our hypothesis that p(a) in (17.26) is positive. 
Hence, the first part of the lemma has been proved. The second part 
of the lemma is proved in a similar manner by assuming that p(a) is 
negative. de 


We are now ready to verify the maximum and minimum theorem. 


Proof (of the maximum and minimum theorem): The proof is 
by contradiction. We first assume that 


u(ao) > uy +.é, 


where wy is the value of the function u(a) at an arbitrary point on the 
boundary of the defining domain D. We further assume the function 


v(a) = u(x) + 9r(x)’, 


where 
r(x)? = |x — ao|” 


and 7 is some positive constant. It then follows that 
Ayu = Anu + 2nn = 2nn, 


which says that the v(x) is a solution of Poisson’s equation (17.26) 
with negative p(x). Note that v(ao) = u(ao) and by hypothesis 


u(ao) > ute=ute—n7r’. 


548 17 Partial Differential Equations 


Choosing 7 to be so small that throughout D 


2 E 
eS yr > 2? 
we obtain 


is 
v(x) > Up + 3” 


which implies that v attains its maximum somewhere within the do- 
main D. This clearly contradicts the lemma above, so our assumption 
at the beginning of the proof was false. & 


17.2.2 Uniqueness Theorem 


The following theorem establishes the uniqueness of the solution of the 
Dirichlet problems for the Laplace equations. 


@ Uniqueness theorem: 
If it exists, the solution of the Dirichlet problem for a Laplace equation 
is unique. 


Proof Suppose that u, and ug are solutions on D for the Dirichlet problem 
such that 

A,u = f(x) in D, 

u = g(x) on OD. 
Let w = uy — ug, then A,w = 0 in D and w = 0 on OD. By the maximum 


(or minimum) principle, the point x, (or #,,) that minimizes (or maximizes) 
w(a) should be located on the boundary of D. Hence, we have 


0 = w(am) < w(x) < w(a) = 0 


for all a € D, which means that w=0 and u; =u. & 


17.2.3 Symmetric Properties of the Laplacian 


The Laplacian is invariant under all rigid transformations such as translations 
and rotations. A translation from x to a new variable a’ is given by 


a’ —x-+a, 


where a is a constant vector in n-dimensional space. The rotation is expressed 
by 
x’ = Ba (17.27) 


? 


17.2 The Laplacian Operator 549 


where B is an orthogonal matrix with the property BB‘ = B'B = I. 
Invariance under translations or rotations means simply that 


i=l Ow; j=1 Oa'5 

The proof for translational invariance is simple, so we leave it to the reader. 
In physical systems, translational invariance is apparent because the physical 
laws are independent of the choice of coordinates. 

A rotational invariance under (17.27) is proved by using the chain rule on 
the derivative such that 


> bah = LL bib aom, ian r = Logg ‘Ou, = 20 3g? 


where we have used the relation 
k 


Thus the proof has been completed. 
Rotational invariance suggests that a two- or three-dimensional Laplacian 
should take a particularly simple form in polar or spherical coordinates. 


Exercises 


1. Find the harmonic function for a two-dimensional Laplace equation that 
is invariant under rotations. 


Solution: The two-dimensional Laplacian in polar coordinates is 
given by 

Pv 10 1 @ 
Or? as r Or r? Ov? 
where we seek for solutions u(r) depending only on r. Then we 
take the radial part of the Laplace equation, which gives u,. + 
+u, = 0 (r > 0). This is the ODE whose solution is given by 
u(r) = alogr+b (r > 0), where a, b are constants. Note that the 
form of the function log r is scale invariant under the dilatation 
transformation r — cr for a positive constant c. & 


Ao = (r > 0), 


2. Find the harmonic function in three dimensions that is invariant under 
rotations. 


550 17 Partial Differential Equations 


Solution: The Laplacian in spherical coordinates takes the form 


PB DOG ON te ps) 
3 6,2 rar Psndd0\ 00) PanvoRe “7” 


Since the solution depends only on r, we have the Laplace equation 
given by Up, + zu, = 0 (r > 0). So we have (r?u,), = 0 and the 
solution becomes u = 2+ b (r > 0), where a, 6 are constants. 
This is an important harmonic function that is not finite at the 
origin. & 


3. Show that, for an arbitrary integer n > 2, the general form of solutions 
with rotational symmetry is given by 
u(r) =ar?-"+b (n>2,r>0), (17.28) 
where a, b are constants. 


Solution: This is shown by applying the chain rule to the deriva- 
tive such that 


n n 2: Dp) 
Seal BN ea ie 
Ante) = 979 | Fv) => | (r) + u(r) “3 U(r) 
1 
=u" (r) + u(r), (17.29) 
r 
where the relation Or/Ox; = x;/r is used. If A,u = 0, (17.29) 


yields 
“ir In 


u'(r) r 


Integrating twice, we have (17.28). & 


17.3 The Diffusion Operator 


17.3.1 The Diffusion Equations in Bounded Domains 


The diffusion equation describes physical phenomena such as Brownian 
motion of a particle or heat flow, whose general form is written as 


Lpu(x1,%2,--- ,t) =0, (17.30) 


where Lp is the diffusion operator defined by 
Lp =%- 5 0. (17.31) 
i=1 


If the scale transformation t — Dt is used, we have the diffusion equation 


17.3 The Diffusion Operator 551 
Opu — DAu = 0, 


where D is the diffusion constant. For heat flow u represents the tempera- 
ture at position 2 = (x1, 22,---) and time t, and for Brownian motion u is the 
probability of finding a particle at 2 and t. Hereafter, we treat the system of 
the unit diffusion constant D = 1. If we have to go back to the actual diffusion 
equation, we do the transformation t — Dt in the final solution. 


17.3.2 Maximum and Minimum Theorem 


We begin by describing the maximum principle for the diffusion equation 
defined in a bounded domain, from which we deduce the uniqueness of initial 
and boundary value problems. 


@ Maximum and minimum theorem: 

Let D be a bounded domain in R” and 0 < t < T < o. If wis a real- 
valued continuous function, it takes its maximum either at the initial value 
(t = 0) or on the boundary OD. 


Proof For any € > 0, we set 
v(@,t) = u(@,t) +e |2), 


for which we have 
vu, — Anu = —2ne < 0. (17.32) 


If the maximum for u occurs at an interior point (a, to) in the domain 
D x [0,7], we know that the first derivatives u;,vUz,,Ux.,°+: of v vanish there 
and that the second derivative Av < 0. This contradicts (17.32), so there is 
no interior maximum. Suppose now that the maximum occurs at t = T on D; 
the time derivative v; must be nonnegative there because 


v(xo, T) 2 v(ao,T _ ) 


and 
Av <0, 


which again contradicts (17.32). Therefore, the maximum must be at the 
initial time ¢ = 0, namely, D x {0} or the boundary OD. Replacing u by —u, 
we see that the minimum is also achieved on either D x {0} or OD. & 


17.3.3 Uniqueness Theorem 


The maximum principle can be used to prove uniqueness for the Dirichlet 
problem for the diffusion equation. The conditions are given by 


552 17 Partial Differential Equations 


Lpu= f(a,t) on Dx (0,0) 
u(w@,0) = g(x), u(a,t)=h(t) on OD 


for given functions f,g, and h. 
The following is an immediate corollary of the maximum and minimum 
theorem. 


@ Uniqueness theorem: 
There is at most one solution of the Dirichlet problem for the diffusion 
equation. 


Proof Let u(a, t) and v(a, t) be two solutions of (17.33) and w = u—v be their 
difference. Hence, we have Lpw = 0, w(x,0) = 0, w(0,t) = 0, w(a,t) = 0 
on OD. By the maximum principle, w(a,t) has its maximum at the initial 
time or the boundary, exactly where w vanishes. Thus w(a,t) <0. The same 
reasoning for the minimum shows that w(a,t) > 0. Therefore w(ax,t) = 0, so 
that u=v forallt>0. & 


17.4 The Wave Operator 


17.4.1 The Cauchy Problem 


The wave operator (or d’Alemberian) on R” x R is expressed by 
L=0; —A, =0; — 58, (17.33) 
i=1 


from which we have the wave equation in the general form 
Lu = 024 — Anu =0. (17.34) 


The wave equation is the prototype of the hyperbolic PDEs and describes 
waves with unit velocity of propagation in homogeneous isotropic media. By 
making the transformation t — ct, we have the standard form of the wave 
equation 

OFu— Anu =0, (17.35) 


where c is the wave velocity. The solution for (17.35) is obtained by trans- 
forming the time variable t¢ into ct in the result of (17.34). 

The initial value problem for the wave equation is called the Cauchy 
problem and is given by the inhomogeneous wave equation 


O?u(ax, t) — Anu(a, t) = f(a, t) (17.36) 


17.4 The Wave Operator 553 


under the two initial conditions 


u(z, 0) = o(z), 

Opu(@,0) = (a), 
where f, ¢, ~ are continuous and differentiable given functions. For example, 
f(a,t) provides an external force acting on the system described by (17.36). 
The wave operator (17.33) is a linear operator, so the solution is the sum 


for the general solution of the homogeneous equation (17.34) and a particular 
solution for the inhomogeneous equation (17.36). 


17.4.2 Homogeneous Wave Equations 


First, we provide the solution for the one-dimensional homogeneous version 
(f = 0) of the Cauchy problem (17.36), in which the spatial part is defined on 
the whole region of one dimension —oo < x < oo. For example, consider the 
case of an infinitely long vibrating string. The wave equation is written as 


Utt — Ura = 0, (17.37) 


which is a hyperbolic second-order PDE that we can express by 


dad a\/(/a. a 
(5 =) (ate) uno (17.38) 


Ut + Ux = V, (17.39) 
then the first-order PDE for v(t, 7) is obtained from (17.38) as 


Let us set 


Ve — Vz = O. (17.40) 

As shown earlier, (17.40) has a solution of the form 
u(x,t) = g(a + 2b), (17.41) 
where g is any function. Thus we must solve (17.39) for u, which is given by 
ut + Uy = g(a +t). (17.42) 

One solution of (17.42) takes the following form: 
u(x,t) = h(x +t), (17.43) 
which we can check through direct differentiation of (17.43) by setting p = «+t 


such that 
Ou dhOp | h! 


da dpdx 


554 17 Partial Differential Equations 
Ou — dhOp _ n! 


at dp dt 
Then we have 
1 Pp 
A(p) = >| g(p) dp. (17.44) 


Another possibility is the general solution of the homogeneous equation 
obtained by setting g = 0 in (17.42), which takes the form 


u=2z(x—-t). (17.45) 
Adding this to (17.44), we have the general expression of a solution, 


u(a,t) = h(x +t) + 2(@ —-t). (17.46) 


Now let us solve (17.46) under the initial conditions 


u(x, 0) = d(x), 

uz(2, 0) => w(2), 
where ¢ and 7 are given functions of «. From (17.46), we have the relations 
(x) + 2(2), (17.47) 
‘(x) — 2' (2). (17.48) 


By differentiating (17.47), we obtain @! = h’+2’. Combining this with (17.48), 


we have 


1 1 
hi= 5(¢" +), 2= 5(¢ — p). 
Integrating on p yields 
1 1 [? 
yr) = 5010) +5 f ddp +a, 
0 


Ap) = 500) + 5 | vdp—a 


So, we get 


att 
wat) =F e+)+ oe) +5 [var (17.49) 


which is the solution for the initial value problem for the homogeneous equa- 
tion (17.37). 


17.4 The Wave Operator 555 
17.4.3 Inhomogeneous Wave Equations 


Next we solve the initial value problem for an inhomogeneous PDE (f 4 0) 
by applying the method of characteristic coordinates. We transform the 
variables x,t into new variables € = «+t, 7 = x —t. The wave equation for 
the new variables yields 


1 = 
eae w(t “t). 


This equation can be integrated with respect to 7, leaving € as a constant. 


Thus we have 
1p 
ug = -;/ fan, (17.50) 


where the lower limit of integration is arbitrary. Again we can integrate with 


u(f,7) = ra (S 5" nat (17.51) 


Here we consider the dependent variable u at a fixed point (€, 79) defined 
by 


respect to €: 


fo =Xo+to, No =o - to. (17.52) 
We can evaluate (17.51) at the point (9, 49) and make a particular choice of 
the lower limits such that 


ulGosno) = 4 fo [seins 


Here we change the variables €, 7) into the original ones (a, t), and the Jacobian 


is the determinant of its coefficient matrix: 
J = det = 2. 


Thus d€dn = Jdxdt = 2dxdt, so the double integral can be transformed as 


ro+(to— i 
u(xo, to) = sf / (x, t)dadt 
xLo- (to— t) 


As a result, we have the following theorem: 


556 17 Partial Differential Equations 


@ Theorem: 
The unique solution of (17.36) on one spatial dimension is given by 


5 ola + t) + W(x —2)| 
x+t t ee) 
te) w(p)dp + ail a f(a’, t’)da' dt’, 


(¢—t’) 


wlew) = 


where ¢(x) = u(#,0) and w(x) = uz(a,0). 


17.4.4 Wave Equations in Finite Domains 


In this section we attempt to solve the wave equations defined in the region 
D x (0,00), where D is the bounded domain of R”. For this problem, we have 
to specify the initial conditions at t = 0 as well as some boundary conditions 
on OD. As noted in Sect. 17.1.2, the commonly used boundary conditions are 
the Dirichlet and Neumann conditions. First we treat a homogeneous wave 
equation with no external term given by 


Ofu — Anu =0, (17.53) 
where the initial condition is 
u(x,0) = f(x), O,u(x,0) = g(a), (17.54) 
and the boundary conditions on OD are given by 
u(a,t) =0 or On,u(a,t) = 0. (17.55) 


Thus, when the boundary conditions are independent of t, the method of 
separation of variables is useful, i-e., we assume that the solution u takes 
the form 

u(x,t) = X(x)T(t), (17.56) 


where X satisfies the boundary conditions (17.55) on OD. Substituting (17.56) 
into (17.53), we have 


AX(z)__ TT") _ 


Xi TH " 


(17.57) 


This defines a quantity yu? that must be constant since AX/X depends only 
on x and T” /T depends only on t. The reason for the positive constant p.? > 0 
will be seen later. 

Equation (17.57) gives a pair of separate differential equations for X (a) 
and T(t): the one equation is 


17.4 The Wave Operator 557 
AX (2x) = —p?X (a) (17.58) 
that satisfies the given boundary conditions of (17.55), and the other is 
T" (t) = -—p’ T(t) (17.59) 
in which 0 < t < oo. The solution for this ODE is obtained as 
T(t) = acos pt + bsin pt, (17.60) 


where a and b are constants that can be determined from the initial conditions. 
Combining (17.60) with X(a), the solution is expressed as 


u(x,t) = X,,(a)(acos pt + bsin pt). (17.61) 


This is a normal mode of vibration with eigenfrequency jy and the general 
solution is obtained by the superposition of normal modes. Thus, we have 
the general solution of the form 


= $2 Xp(@)(an C08 pint + by sin pint). (17.62) 


For example, from the initial conditions in (17.54), we have 
DS AanXy = f, So nbn Xn =4%9; 


so the coefficients in (17.62) are given by 


dn = (f | Xn), by = (9 | Fn) /un- 


Exercises 


1. Find the general solution for the wave equation defined on the one- 
dimensional bounded domain (0,1) x (0,00), which is given by 
O?u— Au =0, 
under the conditions: u(x#,0) = f(x), Qu(s,0) = g(x), u(0,t) = 
u(l,t) = 0. 
Solution: The normalized eigenfunctions are X, = s sin (2), 


and the associated eigenfrequencies jz, are the integer multiples 
of the fundamental frequency 7/1. Thus, we obtain 


nat nat NTL 
west) =T (an cos F + b, sin F ) sin 7? 


where the coefficients are 


=7 fH) f(a )sin ae, bn = x fate ) sin de. & 


558 17 Partial Differential Equations 


2. The differential equation under a point source r = 0 at time to = 0 in an 
infinite medium is given by 


O,G — AsG = 4(r)6(t). (17.63) 


Find the solution G(r,t) called Green’s function by means of Fourier 
and Laplace transforms. 


Solution: If we take a Fourier transform in space and a Laplace 
transform in time, (17.63) becomes G(k,w) = 1/[(27)?(w + k?)). 
The inverse Laplace transform yields G(k,t) = et /(2n)3. We 
then obtain Green’s function G(r, t) by the spatial Fourier trans- 
form given by 


1 —k*t ik-r 73 
G(r,t) = an ed’ k. 
Integrating over the angles of r yields 


1 cf _ 2, 8inkr 5 1 OP om a 
t) = —> ——k*dk = —I “ST kdk 
GUng) cae . kr 4nr itn 


Los Pe 


= —Im -— 
4nr i Or Jo 


—Co 


g Py 
ek tetkr dk, 


which gives Green’s function in the form G(r, t) = e~” /4¢/(4nt)3/?, 
t>0. Fort <0, G(r,t)=0. & 


3. Find the half-space one-dimensional Green’s function defined on x > 0 
that satisfies the boundary condition of G(x, t) = 0 at rp = 0. 


Solution: Using an image source of negative strength at x = —29, 
the solution is expressed by G(x, t) = Go(x—20,t) —Go(a+20,¢), 
x > 0, where Go(z,t) =e~* /4*/(4nt)/2. & 


4. Consider the wave equation with a source term h(r,t) given by 
Of f — Asf = h(r,t). 


Show that the solution is expressed as 


f(r,t) = ig [MOO (17.64) 


17.5 Applications in Physics and Engineering 559 


Solution: The Green function G(r, t) is defined as the solution sat- 
isfying the equation 0?7G — A3G = 6(r)d(t). The spatial Fourier 
and temporal Laplace transform of the above is obtained by set- 
tings w > —w? in Exercise 1 as G(k,w) = 1/[(27)3(—w? + k?)]. 
The inverse transform of the above gives G(|r — r’|,t — t’) = 
d(|r—r’|)o(t-—t’) /(4n|r —r']). Since the physical system is invari- 
ant under the translations in space and time, the Green function 
depends only on relative space and time coordinates |r — r’| and 
t —t’. The Green function has the property that the solution can 
be written as f(r,t) = f G(r,r’;t,t/)h(r’, t')d?r’dt, so it is given 
by (17.64). & 
6. Find the general solution for the wave equation 


d?u — Azu =0, 
where A3 is the Laplacian defined by 


As = 03 + 02 + dF. 


Solution: Since the system is isotropic and homogeneous, we can 
assume the solution in analogy with the one-dimensional case as 
f(p) = f(n-xt), where n is the unit vector that points along the 
direction of propagation of the wave and n-a = la+my+nz. From 
the chain rule on derivatives, we have (I? +m? +n? —1) f’"(p) = 0. 
Thus, f can be arbitrary since I? + m? + n? = 1. The general 
solution is given by u(x,t) = f(n-a2—t)+g(n-x2+ 1), where n 
is any unit vector. d& 


17.5 Applications in Physics and Engineering 
17.5.1 Wave Equations for Vibrating Strings 


In the previous sections, we rigorously studied the theories underlying the 
following three typical classes of partial differential equations: Laplace equa- 
tions, diffusion equations, and wave equations. In this section, we attempt 
to formulate mathematical expressions for these classes on the basis of the 
associated physical phenomena; e.g., we will see that the mathematical form 
of the wave equation 

OPu(a,t) = v?O2u(a, t) (17.65) 


is derived by considering the wavy motion of a that string. This will make 
clear why equations of the same form as (17.65) are called wave equations. 


560 17 Partial Differential Equations 


Xx 


Fig. 17.1. Schematic of a thin stretched string and a line element As; the tensions 
exerted at both ends of the line element have equal magnitudes 7 but act in different 
directions 


Suppose that a thin string is stretched between two fixed points with a 
tension T exerted at the two end points. We assume that the string is perfectly 
flexible, i.e., only tensile forces can be transmitted in the tangential direction. 
Then, as illustrated in Fig. 17.1, the magnitude of the tension exerted in the 
tangential direction is the same for every part of the string. It can be seen 
from the figure that the vertical components of the tension 7 at the two ends 
of a line element with length As are —7sin6, and Tsin@, where tT = |r|. 
Hence, the vertical component 7,, of the external force exerted on the line 
element is given by 


Tu = T (sin 62 — sin 6;) . (17.66) 


The sine terms are rewritten in terms of the derivative 0,u by using the 
following approximations: 


Ou dx 1 


sin 0, = ay = ———— 
1+ (d,u)? 


1 
~ O,u f+ 5 aw? +---| ~ Onu 


where the relation ds = ,/(dx)? + (du)? was used, and 


; Ou(x+ Ax) Ou(at+ Ax)dx  Ou(x+ Az) 
sin #2 = = ~ 


Os Ox ds Ox 
= 0,u+ rule Az, 


where € is a constant satisfying the condition « < € < «+ Az. (The mean 
value theorem ensures the existence of such a constant €.) Substitution of 
the sine terms in (17.66) yields 


17.5 Applications in Physics and Engineering 561 
_ 7 92 
T= 7 Oru Az. 
Since the inertial force exerted by the line element is given by 
pAsd?u, 


where, (p is the line density) we obtain the equation of motion for the string 
in the vertical direction: 


Taking the limit Ax — 0 so that € — x, and then approximating ds/dz as 1, 
we obtain the final result: = 
Fu = ake (17.67) 


It is customary to denote the positive constant as T/p, v7, which allows us to 
write (17.67) in the familiar form (17.65). 


17.5.2 Diffusion Equations for Heat Conduction 


In this subsection, we attempt to derive mathematical expressions for diffu- 
sion equations by considering the physical phenomena that occur during heat 
conduction, i.e., the flow of heat in a certain medium from points at a high 
temperature to those at lower temperatures. This process takes place in such 
a manner that molecules in irregular motion exchange their kinetic energy by 
colliding with each other. 

We aim to determine the amount of heat 6Q penetrating an arbitrarily 
chosen surface element 5S inside the medium per unit time (called the heat 
flux). In order to find 6Q, we consider another surface element 651, which 
has the same magnitude as 6S, parallel to 6S and located at an orthogonal 
distance An from 65. We assume that 6S is so small that the temperatures 
u=u(a,y, 2z,t) on 6S and uy = u(x+ox,y+ dy, z+6z,t) on 6S) are constant 
over 0S and 6,5}, respectively. 

From thermodynamics, we know that the magnitude of the flux of heat 
difference between u and u;, denoted by du, and the area of the surface element 
are related in the following manner: 


6Q-dn=k-du- dS, (17.68) 


where the value of the constant «, called the thermal conductivity, depends 
on the medium. Dividing both sides of (17.68) by dn and taking the limit 
én — 0, we have 


Ou 
6Q = K— - OS. 
“ "On 
Here, 0u/On is the derivative in the direction normal to 6S and is expressed 


as 


562 17 Partial Differential Equations 


where 7 is the unit vector normal to 6S. Thus, we obtain the flux passing 
through a volume element 6v that is enclosed by a surface S: 


ef [ Seas=x [ [ vu-nas 
=n f{f{-v-(vwav. 


We now apply the mean value theorem to the volume integral over dv to 
obtain 


Q 


I 


Q = KV: [Vul(a*,y*, 2*,t)] dv, (17.69) 


where (a*, y*,z*) is a point in dv. 

Apart from the above-mentioned discussion, we also see that Q is related 
to the temperature variation in dv with time. In fact, the temperature u in dv 
increases (or decreases) owing to the accumulation (or loss) of heat in dv at a 
rate of Ou/Ot. Therefore, the flow of heat into (or out of) dv can be written 
as 


Ou 
po, ou, (17.70) 


where p is the mass density and o (the specific heat) is a characteristic of 
the medium. By setting (17.69) equal to (17.70), and allowing the volume 
element dv to shrink to a point, we have 


podku = KV - (Vu) = 6V7u. 


Clearly, this result is of the same form as a diffusion equation that describes 
heat conduction in a medium with physical parameters p,o, kK. 


Part VI 


Tensor Analyses 


18 


Cartesian Tensors 


Abstract Tensors are geometric entities that provide concise mathematical frame- 
works for formulating problems in physics and engineering. The most important 
feature of tensors is their coordinate invariance: tensors are independent of the type 
of coordinate system chosen. This feature is similar to the condition that the length 
and direction of a geometric figure do not change, regardless of the coordinate sys- 
tem used for the algebraic expression. In contrast, the components of a tensor are 
coordinate-dependent in a structured routine. In this chapter, we discuss the ways 
in which the choice of a coordinate system affects the components of a tensor. 


18.1 Rotation of Coordinate Axes 


18.1.1 Tensors and Coordinate Transformations 


A tensor is a natural generalization of a vector or a scalar encountered in 
elementary vector calculus. The latter two are, in fact, both special cases of 
tensors of order n, whose specification in a three-dimensional space requires 
3” numbers, called the components of the tensor. In this context, scalars are 
tensors of the zero order with a 3° = 1 component and vectors are tensors of 
the first order with 3! = 3 components. 

Of importance is the fact that a tensor of order n is much more than 
just a set of 3” numbers. The key property of tensors is adherence to the 
transformation law of its components under a change of coordinate system, 
say, from a rectangular to elliptic, polar, or other curvilinear coordinate sys- 
tem. If the coordinate system is changed to a new one, the components of 
a tensor change according to a characteristic transformation law. We shall 
see that this transformation law makes clear the physical (or geometrical) 
meaning of the tensor being invariant under a change of coordinate system. 
The coordinate-invariance-character of tensors answers the demand that the 
proper formulation of physical laws should be independent of the choice of 
coordinate systems. 


566 18 Cartesian Tensors 


It is obvious that physical processes must be independent of the coordinate 
system. However, what is not so trivial is what the coordinate-independence 
property of physical processes implies about the transformation law of math- 
ematical objects (i-e., tensors). The study of these implications and the classi- 
fication of physical quantities by means of the transformation laws constitute 
the primary content of this chapter. Emphasis is placed on the fact that all 
kinds of tensors are geometric objects whose representation (i.e., The values of 
its components) obey a characteristic transformation law under coordinate 
transformation. 


18.1.2 Summation Convention 


In order to simplify subsequent notation, we introduce the following two con- 
ventions: 


@ Summation convention: 

When the same index appears repeatedly in one term, we carry out a 
summation with respect to the index. The range of summation is from 1 
to n, where n is the number of dimensions of the space. 


@ Range convention: 
All non repeated indices are understood to run from 1 to n. 


These conventions are operative throughout this chapter unless specifically 
stated otherwise. 


Example The summation convention yields the new notation as 


n 
ab; — by a,b; = a,by + abe fteeet Anvn.- 
i=l 


Similarly, if ¢ and 7 have the range from 1 to 2, then 


aig bi; = 04561; ele. a2; 62; 


= a41b11 + Q12b12 + A21b21 + Aaabaa, 
where it does not matter whether the first sum is carried out on 7 or j. 


Remark. Repeated indices are referred to as dummy indices since, owing to 
the implied summation, any such pair may be replaced by any other pair of 
repeated indices without changing the meaning of the mathematical expres- 
sion. 


18.1 Rotation of Coordinate Axes 567 
18.1.3 Cartesian Coordinate System 


A tensor is a mathematical object composed of several components. The values 
of the components depend on the coordinate system to be employed, so are 
altered through a coordinate transformation even when the tensor itself re- 
mains unchanged. Among the many possible choices of coordinate transfor- 
mations, a rigid rotation of a rectangular Cartesian coordinate sys- 
tem is the simplest. The remainder of this section is devoted to explaining 
the basic properties of the simplest coordinate transformations, as a prelimi- 
nary for our subsequent study of tensors in terms of more general coordinate 
systems. 

We begin with two formal definitions: 


@ Cartesian coordinate system: 

A Cartesian coordinate system associates a unique ordered set of real 
numbers (coordinates) (#1,%2,:-: ,%») with every point in a given n- 
dimensional space by reference to a set of directed straight lines (coordi- 
nate axes) Ox),Ox2,:-- ,Ox, intersecting at the origin O. 


@ Rectangular Cartesian coordinate system: 

If the three axes of a Cartesian coordinate system are mutually per- 
pendicular, we have what is called a rectangular Cartesian coordinate 
system. 


Figure 18.1 is a schematic illustration of a rectangular Cartesian coordinate 
system in three-dimensional space. Referring to this coordinate system, we 
denote the triples (1,0,0), (0,1,0), (0,0,1) by e1, e2, es, respectively. These 
triples are represented geometrically by mutually perpendicular unit arrows. 

The set of Cartesian axes Ox, Ox2, and O23 is said to be right-handed 
if and only if the rotation needed to turn the x!-axis into the direction of 
the x?-axis through an angle Zx'Ox? < 7 would propel a right-handed screw 
toward the positive direction of the x-axis. Conversely, if such a rotation 


Fig. 18.1. (a) Right-handed and (b) left-handed Cartesian coordinate systems 


568 18 Cartesian Tensors 


propels a left-handed screw in the positive direction of the x-axis, the set of 
axes is said to be left-handed. In this section, we consider only Cartesian 
coordinate systems that are rectangular and right-handed. 


18.1.4 Rotation of Coordinate Axes 


We now formulate a rigid rotation of rectangular Cartesian axes. Assume a po- 
sition arrow r whose components are given by (#1, 22,23) and (2'1,2’2,2'3) 
in terms of two different rectangular coordinate systems having a common 
origin. We denote the set of unit orthogonal basis arrows associated with the 
unprimed and primed system by {e;} and {e’;}, respectively. The transfor- 
mation from one Cartesian coordinate system to another is called a rigid 
rotation of Cartesian axes and has the following property: 


4 A rigid rotation of Cartesian axes: 
A rigid rotation of Cartesian axes is described by the transformation 
equations of coordinates xr, as 


x’; = Rj, 2, (summed over k), (18.1) 
ry = Rj, x’; (summed over j), (18.2) 


where Rj, = e’; - e, are directed cosines of e’; associated with e,. 


Remarks. 


1. Each of the two indices j and k for R;, refers to a different basis: the first 
index j refers to the primed set {e’;}, while the second index k refers to 
the unprimed one {e,}. This means that, in general, Rj, - Rz;. 


2. The transformation coefficients Rj, do not constitute a tensor, but simply 
set of real numbers. (See the second remark in Sect. 18.2.3.) 


Proof A geometric arrow r joining the origin O and the point P is expressed 
by 
p= pep Sa (eh (18.3) 


We expand e;, by the set of {e’;} as 
ex = (ej; -ex) ej = Ryre’;, (18.4) 


where we use both the range and the summation conventions. Substituting 
(18.4) into (18.3), we obtain 


/ Poet 
CRjKe j =U jE 5, 


18.1 Rotation of Coordinate Axes 569 


and thus 
(x, Rjx —2'j)e'; =O. 
Since the arrow set {e’;} is linearly independent, the quantities in the paren- 
theses equal zero, which results in the desired equation (18.1). 
Similarly, expanding {e’;} by {e,} as 


e'; = (en: 5) ex = Rjxex (18.5) 


and substituting it into (18.3), we arrive at equation (18.2). & 


Remark. Observe that in the transformation law (18.1) and the expansion 
(18.5), Rj, acts on an unprimed entity (i.e., x, or e,) to produce a primed one 
(i.e., x’; or e’;). However, in (18.2) and (18.4), Rj, acts on a primed entity to 
produce an unprimed one. In all the cases above, we should make sure that the 
order of indices 7 and k attached to the coefficients Rj, remains unchanged: 
the first index, j7, always refers to the primed entity and the second, k, to the 
unrpimed one. 


18.1.5 Orthogonal Relations 


The following theorem states an important property of the transformation 
coefficients Rj; that gives rise to a rigid rotation of Cartesian axes. 


@ Orthogonal relations: 
The transformation coefficients Rj, for a rigid rotation of axes satisfy 
the conditions 
Rix PRjr = One Riz Rie = Oke. (18.6) 


Proof The first relation of (18.6) follows from a geometric formula for the 
angle @ between two basic arrows: e; and e’,. Taking the inner product of the 
two basic arrows e/ = Ri,e, and e; = Rjeee, we have 

cos 6 = e, : e’, = Ri¢Rje(€x : ee) = RirRjedne = Riz Rjr- (18.7) 
If i = j, e; and e/ 
orthogonal so that 6 = 7/2. Hence, we have 


coincide so that 6 = 0, whereas if i # j, e; and e/ are 


1 ifi=j, 
Rar ky = {4 Pes, 
The second equation of (18.6) can be verified in a similar manner by con- 
sidering the angle between e;, and ey. & 


The physical meaning of the relations (18.6) is rather obvious. They ensure 
that the axes of each set {e’;} or {e,} are mutually orthogonal, i-e., 


e;-e; =; and ex,-ec = dxe. 


570 18 Cartesian Tensors 


18.1.6 Matrix Representations 


Since the transformation coefficients Rj, have two subscripts, it is natural 
to display their values in matrix form. The notation [R;,] is used to denote 
the matrix having Rj, as the element in the jth row and kth column. In 
addition, when denoted by R, it represents a linear operator of a rotation of 
axes without reference to any of the values of its coefficients Rjx. 


Example In two dimensions, a rigid rotation of rectangular axes for an angle 
@ is given by 


; Ri Riz cos@ sin8 
ej,= Rije;, [Rij] = = . (18.8) 


Roy Roo —sin@ cosé 


This rotation of axes gives rise to a coordinate transformation, 


x; = Rijx;, 
or equivalently, 
xy Ri Rie Vy cos @ sin 0 Ly 
= = : (18.9) 
x's Roy Rao Bip) —sin@ cosé vt 
Remark. We comment briefly on the distinction between active and passive 
tranformations; since it often causes confusion. Throughout this chapter, we 
are concerned solely with passive transformations, for which physical entities 
of interest (e.g., the mass of a particle or a geometric arrow) remain unaltered 
and only the coordinate system is changed from {e;} to {e’;}, as given by 
(18.8). In contrast, an active transformation alters the position and/or the 
direction of the physical entity itself, while the axes {e;} remain fixed. In 


the latter case, a rotation of a geometric arrow « through an angle @ in two 
dimensions is described by 


wy cos? —sin#@ ry 
e’; =e; and = : 
x's sind —_cos@ XQ 
which obviously differs from those for a passive transformation. Figure 18.2 


illustrates the difference between the two transformations. 


It can be shown that the determinant of the matrix [Rx«] reads 


det[Rge] = +1 (18.10) 


(see the proof in Exercise 5). This means that there are two classes of rectan- 
gular Cartesian coordinate systems, corresponding to the positive and neg- 
ative signs in (18.10). Throughout this section, we consider only cases of 


18.1 Rotation of Coordinate Axes 571 


Fig. 18.2. Difference between a passive (a) and an active (b) transformation 


det[Rxe] = +1, which corresponds to our previous restriction to a single type 
of coordinate system (i.e., right-handed). We shall see in sect. 18.3.1 that 
the rotation of coordinate axes whose transformation coefficients give rise to 
det[ Rx] = —1 yields the left-handed system, which for the moment is beyond 
our scope. 


18.1.7 Determinant of a Matrix 


We close this section by commenting on a formal definition of the determinant 
of a matrix and its relevant materials, as a preliminary for the exercises below. 


1. The determinant 


Q11 412 °** Gin 
D = det|ay] = |W?) 822 °** Sm (18.11) 
QAn1 4n2°°* ann 


of the square array of n? numbers (elements) a;; is the sum of the n! terms 


(—1)" ik, Gok, “** Onkns (18.12) 
each corresponding to one of the n! different ordered sets k,,k2,--- , kn 
obtained by r interchanges of elements from the ordered set 1,2,--- ,n. 


2. The minor (or complementary minor) M;, of the elements a;; in the 
nth-order determinant D = det[a;;] is the (n — 1)th-order determinant 
obtained from (18.11) on erasing the ith row and the jth column. 


Example Given a third-order determinant 


G11 412 413 
D = | a91 a22 93], 
431 432 433 


its a2 minor is obtained by removing the first row and the second column 
in D, as expressed by 
21 423 


M2 = 
ee a31 433 


572 18 Cartesian Tensors 


3. The cofactor C;; of the element a,; is defined by 


oD 
Cy = Bags’ 


or equivalently, ae 
Ci = (—1)°*? My;. 
4. A determinant D may be represented in terms of the elements and cofac- 
tors of any one row or column by 


i=l k=1 


This is called simple Laplace development of a determinant D. (The 
proof is given in Exercise 2). The expression (18.13) gives the same value 
for D regardless of the column or row [i.e., no matter what value of j in 
(18.13)] we choose in the expansion. Note also that for 7 4 h, 


n n 
) aijgCin = ) ajrChk = 0. 
i=1 R= 1. 


Example The expansion by the first row gives 


6 4 
0 2 


2 4 
—1 2 


2 6 


|-a| 2, 4]+0] 2 §[=—n 


Remark. In view of the expansion (18.13), an nth-order determinant D is rep- 
resented by a linear combination of n numbers of (n—1)th-order determinants. 
Similarly, each of the latter (n—1)th-order determinants is in turn represented 
in terms of n — 1 numbers of (n — 2)th-order determinants, and so on. In a 
successive manner, we finally arrive at n! numbers of first-order determinants 
(i.e., just n! real numbers), each of which is expressed by (18.12). 


Exercises 


1. Check the validity of the orthogonal conditions (18.6) for the transforma- 
tion coefficients Rj; in two dimensions. 
Solution: For instance, if we set 7 = 1 and k = 2, then 


Ry Rio = RyRy. + Ro Roo = —cos@sin@ + cos @sin@ = 0, 
or if 7 =k = 2, we have 
Ri2Ri2 = Ro, Ro + Roo Roo = (- sin 0)? + cos” d=1. 


Other equations can be proved in a similar way. & 


18.1 Rotation of Coordinate Axes 573 


2. Show that the expression (18.13) for a determinant 


D:= detla,,| = s airCy, and D= rs agjCy; for fixed i 
k=1 i=1 


yields the same value of D no matter what value of 7 we choose. 
Solution: We prove only the first formula, since the proof of the 
second is quite similar to that of the first. 

It easily follows that the statement is true for a second-order 
determinant for which the expansions with fixed i = 1, a,,@22 + 
@12(—a91) and that with 7 = 2, a21(—ai2) + a22a11, give the same 
value. By mathematical induction, we tentatively assume that the 
statement is true for an (n — 1)th-order determinant and try to 
prove that it is also true for an nth-order determinant. 

To do so, we expand D in terms of each of two arbitrary 
rows, say, the ith and the jth row with 7 < j, and compare the 
results. 


(i) Let us first expand D by its ith row. A typical term in the 
expansion by the ith row reads 


QinCik = Gin: (—1)*** Mig, (18.14) 


where 7 is fixed and k runs from 1 to n. Since the minor Mj, of 
aiz in D is an (n—1)th-order determinant, owing to the induction 
hypothesis it can be expanded by any row. Expand Mj, by its 
(j — 1)th row. This row corresponds to the jth row of D, because 
Mix does not contain elements of the ith row of D andi < 7. 
Hence, the expansion of Mj, by its (j — 1)th row consists of the 
linear combination of the elements aj with €= 1,2,--- ,k-1,k+ 
1,--- ,n (ie, 24k). We distinguish between the two cases, 0 < k 
and ¢ > k, as follows. 

For € < k, the element aj¢ belongs to the ¢th column of Mix. 
Hence, the term involving aj;¢ in the expansion of Mj, reads 


a;e - (cofactor of aje in Mix) = aje-(—1)9- Yt? Migge. (18.15) 


Here Mixje is the minor of aje in Miz, which is obtained from D 
by deleting the 7th and jth rows and the Ath and ¢th columns 
of D. Then it follows from (18.14) and (18.15) that the resulting 
terms in the expansion of D are of the form 


aizaje-(—1)’Minje with b=it+tk+j+é—-1. (18.16) 


If £ > k, the only difference is that a;~ belongs to the (¢—1)th 
column of M;, because M;;, does not contain elements of the kth 
column of D and k < @. This results in an additional minus sign 


574 18 Cartesian Tensors 


in (18.15) and instead of (18.16) we obtain —a;pa,;¢- (-1)°*! Minje 
with the same value of b. In short, the expansion of D by the ith 


row yields 
n n b—1 n 
D=)>" anCix = >> a4 (-1)?ajeMinje + S > (-1)?MajeMinge 
k=1 k=1 (=1 (=k+1 


(18.17) 
with b=itk+j+é-—1. 
(ii) We next expand D by the jth row. A typical term in this 
expansion is 
ajeCje = aye - (—1)9**Mje. (18.18) 


By the induction hypothesis, we may expand the minor Mj, of aj¢ 
in D by its ith row, which corresponds to the ith row of D since 
eee 

For k > @, the element a; in that row belongs to the (k—1)th 
column of M;¢, because Mj¢ does not contain elements of the ¢th 
column of D and @ < k. Hence, the term involving a;, in this 
expansion is 


aix - (cofactor of asp in Mje) = aix - (—1)**°-) Mixje, (18.19) 


where the minor Mjx;¢ of aiz in Mj is obtained by deleting the 

ith and jth rows and the kth and ¢th columns of D, and is thus 
identical to Mixje in (18.15). It follows from (18.18) and (18.19) 
that this yields a representation whose terms are identical to those 
given by (18.16) when @ < k. 

For k < @, the element a,x, belongs to the kth column of M;e, 
so we get an additional minus sign and the result agrees with that 
characterized by (18.16). Hence, we conclude that the expansion of 
D by the j(> 2)th row, $77, 4j4Cjx, is identical to the expansion 
(18.17). 

The conclusions from the discussions in (i) and (ii) clearly show 
that the two expansions of D consist of the same terms, which 
completes our proof of the given statement. & 


3. Let by; = Cjx/D, where C;, is the cofactor of [a;x] in D = det[a;x]. Show 
that 
bej Qe = je and bei aye = One, (18.20) 


which means that the matrix [b,,;] is the inverse of [a;,]. 


Solution: If follows that 


C; CG Cita ae 
bei Qek = at jit a2 7 at Chad Ste, (18.21) 


18.1 Rotation of Coordinate Axes 575 


The discussion in Exercise 2 tells us that the sum in the numerator 
equals D when @ = 7 regardless of the value of 7. Hence, we have 


bej Ger =1 if 7 =; 


We next consider the case of 7 4 ¢. To do thus we replace the 
elements in the jth row of D by those in the @(¢ j)th row of D. 
The resulting determinant, denoted by D, reads 


ai1 Qa12 "** Gin 


Qj—1,1 Aj-1,2 *** Aj—1,n 
a1 ae2 *** Gn |=), 


& 
il 


Qj+41,1 4j+41,2 °** Aj+i,n 


an, Qn,2 *** Ann 


since D has two identical rows. Note that the expansion of D by its 


fth row is D = a Qe pC jp, Which equals the sum in the numerator 
p=1 
in (18.21). It thus follows that 


brejaen =O if 7 Ze. 


These arguments complete the proof of the first equation. The 
second can be verified in the same manner. & 


4. Suppose a matrix [Q;,;] defined by Qk; = Cyx/det[Rjx], where the Rjx 
are the transformation coefficients of a rigid rotation of axes and the Cj, 
are the cofactors of [Rj,] in det[R;x]. Prove that Qi; = Rjr.- 

Solution: Apply the result of Exercise 3 to find that Q,; Rep = 6¢;- 
Multiplying both sides by Re, and summing with respect to ¢, we 
arrive at 

Qj RerRem = 5¢jRem = Rym- 
Then, the orthogonal relation Re, Rem = dkm implies that Qxj0khm = 
Qmj = Rim. & 


5. Show that det[R;,,] = +1. 
Solution: It follows from (18.20) that 


det[QxjRj0] = det [dx¢] =1, 


where the Q,; are the same quantities as in Exercise 4. From 
elementary linear algebra, we find that 


576 18 Cartesian Tensors 


det[Qxj Rye] = det[Qej]det[Rje] = (det[Rje])” , 


where the identity Q,; = Rjx was used to obtain the last term. 
Combining the two results above, we obtain det[|Rj,] = +1. & 


18.2 Cartesian Tensors 


18.2.1 Cartesian Vectors 


Having dealt with the rotation of coordinate axes, we are ready to introduce 
the concept of tensors and their transformation law in terms of Cartesian 
coordinate systems. Assume an ordered set of three quantities v; (i = 1, 2,3) 
that are explicit or implicit functions of 2;. Let us see how the values of 
v;(x,;) change through a rigid rotation of the Cartesian axes. If they transform 
according to the law given below, the quantities v; are called the components 
of a particular kind of tensor, i.e., of a Cartesian vector (or a first-order 
Cartesian tensor). 


@ Cartesian vectors: 

A Cartesian vector v is an object represented by an ordered set of n 
functions v;(x,;) in terms of the z-coordinate system and by another set of 
n functions v’;(x’;) in terms of the x’-coordinate system, where v’; and v; 
at each point are related by the transformation law: 


vj = Rijv; Ral ty = Vinee (18.22) 


where lis = e; + €;. 


Obviously, a vector v is a geometric object (like an arrow) so that it is uniquely 
determined independently of the coordinate system. On the other hand, the 
function form of the components v;(x;) depend on our choice of coordinate 
system, even when we consider only the same vector v. This is why the con- 
cepts of a vector and its components are inherently different from each other. 
(See also Sect. 18.2.2 on this point.) 

We emphasize again that the index i of R,; in (18.22) refers to the dashed 
(transformed) function v’;, whereas the j refers to the undashed (original) 
one v;. In the following, we consider several examples of ordered sets of func- 
tions v; in two dimensions, which may or may not be a first-order Cartesian 
tensor. 


Examples 1. The ordered set of functions (v;) (i = 1,2) with the components 
V1 = L and vg = —21. 
Using the transformation law of coordinates 2’; = Ri;x;, we set the fol- 
lowing for each function: 


18.2 Cartesian Tensors 577 


vy = x's => Ro, + Ro2% = -L1 sin 6+ X2 COS 0, 


v's = xy = R124 Ry2%2 = -ZLY1 cos 9 — v2 sin 0. 
On the other hand, the functions vu’; should be obtained from v; through 
the transformation law as 


v1 = Ripv,e = vi cosP@+vosind= x2cosd—2,sin8, 
v'g = Ropv, = —v1,8inO + vecos9 = —x2 sind — x1 cos @. 


The two expressions for v’; and v’2 are identical to one another regardless 
of the values of 6. Therefore, the pair of functions v;(a;) are components 
of a Cartesian vector. 


2. The set v; with vy = rg and vg = 7}. 


Following the same procedure as above, we have 


I 
l 


vy = wg = —82, 4+ C29, 
vo = a1 = cx, + 82x29 


and 
vy = cvyptsvg = c®g+ 824, 
vo = —8U, +c’, = —8%2+ C2), 


where c and s represent cos@ and sin@, respectively. These two sets of 
expressions do not agree with each other. Hence, the pair (2,21) is not 
a first-order Cartesian tensor. 


18.2.2 A Vector and a Geometric Arrow 


The result of Example 2 in Sect. 18.2.1 might be confusing for some readers; 
the functions v;(21, #2) given there are not components of a vector, although 
they appear to represent a geometric arrow in (a1-x2)-plane. To make this 
point clear, we have to comment on the difference between the formal defini- 
tion of a vector as a first-order tensor and our familiar definition of a vector 
as a geometric arrow. 

In elementary calculus, vectors are simply defined by a geometric arrow 
with certain length and direction, commonly denoted by a bold-face letter, 
say, v. Owing to this definition, v is uniquely determined by specifying its 
length and direction, which are both independent of our choice of coordinate 
systems. However, the uniqueness disappears if it is defined algebraically by 
specifying its components v,; relative to given coordinate axes. The values 
of the components vz depend on our choice of coordinate system even when 
the same arrow v is considered. Hence, when we apply a coordinate transfor- 
mation, the values of uv, are altered in a way that preserves the length and 
direction of the arrow v, according to (18.22), which is why we call the set of 
n functions vz not a vector, but the components of a vector. 

In short, we should always keep in mind that a vector is a geometric 
object independent of coordinate systems, whereas components of a vector 
are just mathematical representations of the vector with reference to a specific 


578 18 Cartesian Tensors 


coordinate system. This caution applies to all the classes of tensors presented 
throughout this section. 


Remark. Despite the above caution, we sometimes call a set of components of 
a tensor just a “tensor” to shorten our sentences. However, it is important to 
note an inherent difference between a tensor (=coordinate-independent object) 
and components of a tensor (=coordinate-dependent quantities). 


18.2.3 Cartesian Tensors 


We turn to a second-order Cartesian tensor that requires two subscripts 
to identify a particular element of the set. 


@ Second-order Cartesian tensor: 

A second-order Cartesian tensor T is an object represented by an ordered 
set of two-index quantities T;; in terms of the z-coordinate system and by 
another set of quantities T’;¢ in terms of the x/-coordinate system, where 
T;; and T’;¢ at each point are related by 


Tig = Rie RjeTat, The = RmkRneT mn. (18.23) 


Here, the two-index quantities Tj; are called components of the tensor 
IU, 


In a similar way, we may define a Cartesian tensor of general order as follows: 
The set of expressions 7;;..., are the components of a Cartesian tensor if, for 
all rotations of the axes, the expressions using the new coordinates T" pm...n 
are given by 

gee = RieRjim aan RinTem-n 


and 
Tem---n = RyeRam oe | Seed Laren 


It is apparent that an nth-order Cartesian tensor in three dimensions has 3” 
components. 


Example Assume two Cartesian vectors a and b, each of which is represented 
by the components a; and 6; associated with the same coordinate system. 
Then, it is possible to create nine products of the components expressed by 


ajb, (7,k =1,2,3), 


which is called an outer product (or direct product) of the vectors a 
and b (see also Sect. 18.4.3). The outer product consists of a second-order 
Cartesian tensor. In fact, since each a; and 6; transforms as 


18.2 Cartesian Tensors 579 


a’; = Ripadr and v5 = Ryebe, 
we have 
T's = a’ ;b'; = Rip Rjcanbe => Riz RjeTe. 


Remark. We emphasize that transformation coefficients, say, the R;;, do not 
form a tensor and note the fact that the two indices i and j in the tensor 
T,; refer to the same coordinate system, whereas those in the coefficients 
Rj; refer to different coordinate systems. Hence, Tj; and R;; are inherently 
different from each other, though both require two indices. 


18.2.4 Scalars 


Contrary to the case of finite-order tensors, we now consider quantities that 
are unchanged by a rotation of axes, which are called scalars or tensors of 
zero order and contain only one component. The most obvious example is 
the square of the distance of a point from the origin, which must be invariant 
under any rotation of coordinate axes. Other examples of scalars are presented 
below. 


Examples 1. We show that the scalar product u- v is invariant under 
rotation. In the original (unprimed) system, the scalar product is given 
in terms of components by u;v; and in the rotated (primed) system, it is 
given by 


Te = (Rij uy) (Rik ve) = Rij Ripujvp = UjUKOFk = UjU;, (18.24) 


where the orthogonal relation R;; Rj, = 5;, in (18.6) was used. Since 
the expression in the rotated system is the same as that in the original 
system, the scalar product is indeed invariant under rotations. 


2. If the v; are the components of a vector, the divergence V - v = 0u;/0x; 
becomes a scalar. This is proven as follows: In the rotated coordinate 
system, V - v is given by 


Ov’; 0 OvE 
= >— (Rive) = Rix aor 


Ox'; Ox'; 
where the elements Riz, = ex - e’; are not functions of position. Using the 
relation Ox, /Ox'; = Rj; (see Exercise 2 below), we have 


R, OVE Ox; Ovg OVE Ovb Ov; 
Oa, - Ox! ;, OX; > Mf Ox; 


Finally, we obtain 


580 


18 Cartesian Tensors 


Exercises 


3. 


. Show that Rij e'; “ej 


Examine whether or not the ordered set of functions v; defined by v, = 
(x1)? and v2 = (x2)? constitute a vector. Here (x,)? means the square of 
Uy. 
Solution: To examine the first function, v;, alone is sufficient to 
show that this pair is not a vector. Evaluating v’; directly gives 


v'y = (2'1)? = c?(21)? + 2c(—s)a1 22 + (—8)?(x2)?, 


whereas (18.8) requires that v’; = cvy — svg = c(@1)? — s(x2)?, 
which is different from the above. & 


Ox’ ; Ox; 
Ox; Ox'; 
Solution: Set x’;e’; = xj;e,; and differentiate both sides with re- 
spect to x’, to obtain (Oz’;/Ox;)e’; = e;. Taking the scalar prod- 
uct of both sides with e’; yields 

Ox’ ; al Ox’ ; a Ox" ;, 

Ox; : % Ox; ee Ox; 
Hence, we have Ry; = Ox',,/Ox;. Similarly, if we differentiate by 
x’, (instead of x;) the first identity yields Ri; = Ox;/Ox',. & 


in Cartesian coordinate systems. 


and e;-e', = Ry;. 


Show that the gradient of a vector v, denoted by Vv, is a second-order 
tensor. 
Solution: Suppose that v; represents the components of a vector 
v and consider the quantities generated by Tj; = Ov;/Ox,; (i,j = 
1,2,3). These nine quantities form the components of a second- 
order tensor, as can be seen from the fact that 


= Ov’; = O( Rive) Oxe -—R OvR 


ij = = = Rir— Rye = Riz RyeTre. 
7 Ox"; Oxg Ox! ; © Oa, ae feo oe ee & 


ves 


Remark. The concept (and its notation) Vv introduced above is not in the 
category of simple vector calculus. In fact, the quantity Vv is not a vector 
like V x v and V@, but a second-order tensor. 


18.3 Pseudotensors 


18.3.1 Improper Rotations 


So far our coordinate transformations have been restricted to rigid rotations 
described by an orthogonal matrix [R;;] with the property 


18.3 Pseudotensors 581 
|R| = det[Ri;] = +1. 


Such transformations are called proper rotations. We now broaden our dis- 
cussion to include transformations that are described by an orthogonal matrix 
[Ris] for which 

|R| = —1. 


The latter kind of transformations are called improper rotations (or rota- 
tion with reflection). Below are two examples of improper rotations. 
(a) Inversion 


The most obvious example of an improper rotation is an inversion of the 
coordinate axes through the origin represented by 
e.=—e,; for alli=1,2,3. 


uv 


In this case, a position arrow « is described in terms of the bases e/, and e;, 
respectively, by 


x=x1e;, and xr=2' je, =2';(—e;). 


Equating them, we obtain 


v= —ax'; = —6;52" 5, 
which shows that an inversion of axes is expressed by Rij = —d;;. In fact, its 
determinant becomes 
-1 0 0 
[IR}=| 0 -1 0 |=-1. 
0 0-1 


(b) Reflection 
Another example is a reflection that reverses the direction of one basis: 


e’; =—e,; for a specified 7. 
For the reflection of the z-axis, e.g., we have 


-100 
IR}=| 0 1 0/=-1. 
001 


Remark. Note that an inversion is different from a proper rotation only for 
an odd dimensionality. In a case of two or four dimensions, for instance, an 
inversion is the same as a proper rotation. In contrast, a reflection that changes 
the sign of only one coordinate is always different from a proper rotation 
regardless of the dimensionality. 


582 18 Cartesian Tensors 


Through an improper rotation, our initial right-handed coordinate 
system is changed into a left-handed one. This is illustrated schematically in 
Fig. 18.3. The reader should note that such a change cannot be accomplished 
by any kind of proper rotation. 


18.3.2 Pseudovectors 


Regardless of whether it is proper or improper, any rotation described by R,; 
transforms the components v; of a vector v as 


/ 
Vi= Rijv;- 


This is because any real physical vector v may be considered as a geometrical 
object (i.e., an arrow in space), whose direction and magnitude cannot be 
altered merely by describing it in terms of a different coordinate system. 

It is, however, possible to define another type of vector w whose compo- 
nents w; transform as 


i = Rew; under proper rotations, 
wie= —R,;w; under improper rotations, 


or equivalently, 
w = |R|Rz;w;. 


In this case, the w; are no longer strictly the components of a true Cartesian 
vector. Rather, they are said to form the components of a pseudovector or 
a first-order Cartesian pseudotensor. A pseudovector may be alternatively 
referred to as an axial vector; correspondingly, a true vector may be called 
a polar vector. 


y (oe 


(a) 


e 


e; e, 


Fig. 18.3. Improper rotation: (b) inversion and (c) reflection of the right-handed 
Cartesian coordinate systsem depicted in (a) 


18.3 Pseudotensors 583 


Remark. A pseudovector should not be considered a real geometric arrow 
in space, since its direction is reversed by an improper transformation of 
the coordinate axes. This is illustrated in Fig. 18.4, where the pseudovec- 
tor w is shown as a broken line to indicate that it is not a real physical 
vector. 


Below is a summary of the discussion above. 


@ Vectors and pseudovectors: 
Components uv; of a vector v transform as 


a 
ViG= Tiagiidis 


under a rigid rotation of Cartesian axes, whereas components w; of a pseu- 
dovector w transform as 
i 
wi = [RI Rijw;, 


where |R| is the determinant of the transformation matrix [R;,]. 


Hence, the difference between a vector and a pseudovector manifests when 
applying an improper rotation that yields |R| = —1. 

Pseudovectors occur frequently in physics, although this fact is not usually 
pointed out explicitly. Following are physical examples of pseudovectors. 


Examples The following three physical quantities are all pseudovectors. 


1. Angular momentum of a moving particle, L = r x p, where r is the 
particle’s position arrow and p its moment vector. 

2. Torque on a particle, N = r x F, where r is the particle’s position arrow 
and F the force acting on the particle. 

3. Magnetic field, B = V x A, defined by the rotation of the vector potential 
A. 


e, e, 
w=w.e 
3&3 = 

Ww =—w,e; 

; =—W3e, 
e, 6 j 
=-—Ww 
e; e 


Fig. 18.4. Reversing behavior of the pseudovector w via the reflection of the e2-axis 


584 18 Cartesian Tensors 


It is noteworthy that each of these pseudovectors consists of a vector prod- 
uct of two vectors. 


18.3.3 Pseudotensors 


We can extend the notion of vectors and pseudovectors to objects with two 
or more subscripts. For instance, assume a quantity with components trans- 
forming as 


Tig = Rin RyeTre 
under proper rotations, but 
T'i3 = —RinRyeTe 


under improper ones. Then, the Tj; are components of a second-order 
Cartesian pseudotensor. Similarly, Cartesian pseudotensors of arbitrary 
order are defined such that their components transform as 


1 hcg = |R|RieRjm ++: RinTom---n 


where |R| is the determinant of the transformation matrix [R;,]. Correspond- 
ing to these, zeroth-order objects may also be divided into scalars and pseu- 
doscalars, the latter being invariant under rotation but changing sign on im- 
proper rotation. 


18.3.4 Levi—-Civita Symbols 
A typical example of a third-order pseudotensor is the Levi—Civita symbol 
Eijk: 


@ Levi—Civita symbol: 

The Levi-Civita symbol (or the permutation symbol), denoted by 
€ijr, takes the values +1 and —1 if the ordered set 2, 7,k is obtained by an 
even or odd permutation, respectively, of the set 1, 2,3. 


Actually, €i;, takes the values 


€123 = €931 = €312 = +1, 


€213 = €321 = €132 = —1, 


and €;;, = 0 if any two of the indices i, 7,k are equal. 
The pseudotensor property of €;;, follows from a convenient notation for 
the determinant |A| of a general 3 x 3 matrix [A;;] (see Exercise 2): 


IAlEumn = AyeAjmAkn€ijk: 


18.3 Pseudotensors 585 


Certainly, this equation holds for the transformation matrix [R,;;] for rigid 
rotation. Hence, we have 


[Rletmn = RuwRjimRenEijk; 


or equivalently, 
Eijk = [R) Bje Rim Pen Eemn (18.25) 


This shows that ¢;;, is a third-order Cartesian tensor. 

The result (18.25) indicates more than the pseudotensorian character 
of e4jx. It clearly demonstrates that all of the components of €;;, are un- 
altered by any rotation of axes. Tensors endowed with this property are 
called isotropic tensors (invariant tensors or fundamental tensors). 
We know that there are no isotropic tensors of first order and that the 
only ones of second and third order are scalar multiples of 6;; and €;;,, re- 
spectively. Additionally, the most general isotropic tensor of fourth order is 
given by 


Nik Omp a [OimOkp b Wipdkms 


with arbitrary constants A, u,v. (Such a fourth-order isotropic tensor occurs 
in the elasticity theory of solids; see Sect. 18.5.4). All the isotropic tensors 
above are relevant to the description of the physical properties of an isotropic 
medium (i.e., a medium having the same properties regardless of the way in 
which it is orientated). 


Exercises 


1. Show that an angular momentum L = r x p is a pseudovector. 


Solution: Since the position vector r and the momentum vector 
p are vectors, they transform under certain rotations of the axes 
(proper and improper) as ry = Rjxrk, Pm = RmnPn. Hence, the 
components of Z in a new coordinate system read 


UL = elariph 
= (|R| RicRjmRkn Eemn) (Rjgrq) (ResPs) 
= |R| Bip (Rig Reg) (Rinks) Shpnt ods 
= |R| Rie Omqdns EemnlpPq 
= |R| Rie €emn?mPn = |R| Rie Le, 


which clearly indicates that the quantities LD; form the components 
of a first-order Cartesian pseudotensor (i.e., a pseudovector). d& 


586 18 Cartesian Tensors 


2. Determine whether |Alé¢mn = AieAjmAknéijk holds for a general matrix 
[A;;] in three dimensions. 
Solution: Set @ = 1, m = 2, n = 3, for instance, to find that the 
right-hand side reads 


Ay AjoArséigk = A11A22433 + A21A32A13 + A31A12A23 
—Aj1A32A23 — A21A12A33 — Agi A22A13 = |Al. 


Other cases can be proven in the same manner. & 
3. Derive the identity: EigkEklm = dilOjm — dimdjl- 


Solution: We first note that the right-hand side of the above iden- 
tity, Ji0jm = dim jl; reads 


+1 ifi=landj=m Fi, (18.26) 
—1 ift=mand j =1 Fi, (18.27) 
0 otherwise. 


In the case of (18.26), the left-hand side of the desired identity is 
€ighEkim = EighEnig = (Cage)? (18.28) 


Since i # j, (18.28) takes the value +1 when k 4 i and k F j. 
As a result, we successfully obtain the desired identity. A similar 
procedure reveals that €:;~€%1m = —1 in the case of (18.27) and 0 
otherwise. & 


Remark. We should note that in (18.28), we have not summed with respect 
to i and j. This is because the second term in (18.28) was obtained by a 
substitution of particular values into the subscripts / and m, respectively. 


18.4 Tensor Algebra 


18.4.1 Addition and Subtraction 


We demonstrate below the bases of tensor algebra that provide ways of con- 
structing new tensors from old ones. For convenience, we may simply refer to 
T,; as the tensor, but it should always be remembered that the Tj; are the 
components of T in a specific coordinate system. 

The addition and subtraction of tensors are defined in an obvious fashion. 
If Aj;..., and B;;..., are (the components of) tensors of the same order, then 
their sum and differences, Sj;...~ and Djj..., respectively, are given by 


Sij-nk = Aij-ok + Bij--ks 


18.4 Tensor Algebra 587 
Dijk = Aijk — Bij---k, 
for each set of values i,j,---k. Furthermore, the linearity of a rotation of 


coordinates immediately yields 


RipSpq-r FS Rip (Apg---r + Bpq:--r) = RipApg--r a0 RpiBpg-r 
= A’ igen + Biigqur = gee 


18.4.2 Contraction 


Next is an operation peculiar to tensor algebra that is of considerable impor- 
tance in certain manipulations. 


@ Contraction: 
Contraction is an operation that makes two of the indices equal and 
sums over all values of the equalized indices. 


As an example, we consider a third-order tensor Tj;, whose transformation 
law is described by 


De = Riehl on: (18.29) 
Now we perform a contraction of this tensor with respect to 7 and k. 
Setting 7 = k in (18.29) and summing over k, we get 
T" ike = RieRemRenTomn = RieOmnTemn — RiueTenn, 


where we used the orthogonality condition on the sum Rpm Ren. The result 
indicates that the quantity Ti, forms the components of a tensor of order 
1 = 3 — 2. In general, contraction reduces the order of a tensor by two; 
contraction of an Nth-order tensor Tj;...)...m...% by making the subscripts / and 
m equal produces another tensor of order N — 2. In particular, if contraction 
is applied to a tensor of order 2, the result is a scalar. 


18.4.3 Outer and Inner Products 


Let us consider the multiplication of tensors. For example, we may take two 
tensors A;; and Brem of different order and simply write them in juxtaposition: 


Cijkem = Aiz Brem- (18.30) 


Then, the quantities are the components of a tensor of fifth-order, which 
follows immediately from the transformation law of tensors. Such a product 


588 18 Cartesian Tensors 


of (18.30), in which all the indices are different from one another, is called an 
outer product of tensors. 
Another kind of tensor, product of known as the inner product of tensors, 
is obtained from the outer product by contraction. For instance, putting 7 = k 
in (18.30) results in 
Cizjem = AijByem, (18.31) 
which consists of a third-order tensor as demonstrated in Sect. 18.4.2. Then, 
the right-hand side of (18.31) is called an inner product of the components of 
the tensors A;; and Byem.- 


Examples The process of taking the scalar product of two vectors wu and v, 
expressed by u;v;, can be recast into tensor language as forming the outer 
product 


Ti; = Ujv; 
and then contracting it to give 
Tig = Uivi- 


Using the concept of outer (and inner) product of tensors, we can write 
many familiar expressions of vector algebra as contracted tensors. For exam- 
ple, the vector product a = b x c has 


ay= Eijn0jCk, 


as its ith component, where ¢€;;, is the Levi-Civita symbol introduced in 
Sect. 18.3.4. This notation clarifies the distinction between the pseudovector 
consisting of the components ¢;;,b;¢, and the second-order tensor composed 
of the outer product bjc;. 


Remark. The outer product of two vectors is often denoted without reference 
to any coordinate system as 


T=u®v. (18.32) 


This should not be confused with the vector product of two vectors, which is 
itself a pseudovector and is discussed in Sect. 18.3.2. The expression (18.32) 
gives the basis to which the components Tj; of the second-order tensor refer: 
since u = uje; and v = vje;, we may write the tensor T’ as 


f= Uz,e; & VjEj = UjV;JU Cv= Tyju Sv. 
Furthermore, we have 
/ / / / 
T = uje; ® vje; =Uje;, @v jez, 


which indicates that the quantities T’;; are the components of the same ten- 
sor T but referred to a different coordinate system. These concepts can be 
extended to higher-order tensors. 


18.4 Tensor Algebra 589 


We show below several expressions of vector algebra as contracted Cartesian 
tensors: the notation [a]; indicates that one takes the ith component of the 
vector (or tensor) a. 


Examples 
1. a-b= a,d; = bi; aid;. 


2. [a : (b x c)]; = 410; [b x cli = b41@i(E1j%D5 Cx) = EjjnAid; Ck. 


BP VG oA Seis a 

4. Vx a), = eyes. 

5. [V(V-»)],= of (V-v) = ie (52) = iy go 
6. [Vx (Vx w)], = eu ger LV He = Sunekim 


18.4.4 Symmetric and Antisymmetric Tensors 


The order of subscripts attached to a tensor is important; in general, Jj; is 
not the same as 7;;. But there are some cases of interest as described below. 


@ Symmetric and asymmetric tensor: 
If 


holds for all i and j, the tensor composed of Tj; is called a symmetric 
tensor. Otherwise, if 


Ti = —Tyi, (18.33) 


the tensor is said to be antisymmetric (or skew-symmetric). 


A tensor that is symmetric (or antisymmetric) in one coordinate system re- 
mains symmetric (or antisymmetric) in any other coordinate system. In fact, 
if T,; is symmetric in a given system, i.e., Tj; = Tj;, then 


777 = Ryghyle = ReRalin = Gis 


and similarly for antisymmetry and tensors of higher order. 
Notably, every tensor can be resolved into symmetric and antisymmetric 
parts by the identity 
Ty = Sj + Ai, (18.34) 


590 18 Cartesian Tensors 


where : ; 
Sig = 3 (Fis +T;;) and Ajj = 5 (Ti 7s), 


Evidently S;; is a symmetric tensor since it is unaltered even if 7 and 7 are 
interchanged. In contrast, Ai; is an antisymmetric tensor since the signs of 
all the components are reversed by exchanging i and j. Then, 5;; and Aj; are 
called the symmetric and antisymmetric parts of T;;, respectively. 


18.4.5 Equivalence of an Antisymmetric Second-Order Tensor 
to a Pseudovector 


It is noteworthy that in three dimensions, a second-order antisymmetric 
tensor W is associated with a pseudovector w. To see this, let the W;; be 
components of an antisymmetric second-order tensor whose the transforma- 
tion law reads 


Wj = RaeRjmWen 
= Ra RjWi2 + Ria Rj3Wwig + Ri2RjiWo1 + Ri2Rj3Wo3 
+Ri3hj1Wa1 + RisRj2W32, (18.35) 


since Wy, = Wa2 = W33 = 0. Moreover, since Wen, = —Wme, we can reduce 
(18.35) to the form 


W'ig = D> (RieRim — RimRje)Wem, (18.36) 
(¢,m) 
where the sum em) restricts the values of (€,m) to (1,2), (2,3), or (3,1). 
Now we introduce the notation 
w1 = Wes = Wa, wo = Wei = Wi3, ws = Wig = Wai, 
or more concisely, 
Wn = Wen; 
where €,m,n is a cyclic permutation of the numbers 1, 2,3, i.e., 
(G mM, n) = (ly 2, 3), (2, 3, 1), (3, 1, 2). 
Then (18.36) can be written as 
w= S > (RieRjm— RimRje)wn, (18.37) 
(£,m,n) 


in which i,7,4 and @,m,n are both cyclic permutations of 1, 2,3. 

Noteworthy is the fact that (18.37) is equivalent to the transformation 
law of components wz of a pseudovector w. After some algebra, we see that 
equation (18.37) can be reduced to a more compact form as 


18.4 Tensor Algebra 591 


w = [R| Rin Wn, (18.38) 


which is nothing but the transformation law of a pseudovector. [See Exercise 
2 for the proof of (18.38).] 
We have now arrived at the following theorem: 


@ Theorem: 
Assume a second-order antisymmetric tensor in three dimensions, whose 
components W;; take the form 


0 Wi —W31 
[Wij]= | —Wi2 0 Wag 
W31 —Wo3 0 


Then, the three components, Wi2, W31, and W23 can be associated with 
the pseudovector w whose components are given by 


(w1, we, w3) = (W23, Wa1, Wie), 


or more concisely, 


1 
Wi= 5 ciskWie- (18.39) 


The right-hand side of (18.39) is a twice-contracted product of the third-order 
pseudotensor, E% ,» and second-order tensor, W;;; hence, it is a pseudovector. 


Examples In physical applications, we often use the vector representation 
(18.39) of a second-order antisymmetric tensor. For instance, let us consider 
the equations of angular momentum of a moving particle with mass m. We 
assume that a force F acts on the particle located at 2. Then, with 7 and j 
each taking the values 1, 2, 3 we get 


M(jL_ — FeLi) = FjT~ — Foxy, (18.40) 


which gives us nine equations. Note that both sides of (18.40) are antisym- 
metric tensors. Among the nine equations, therefore, there are only three that 
are independent, (j,k) = (1,2), (2,3), (3,1). So we can convert (18.40) into a 
more concise vector form as 


where we have defined 


Wi= Eijk (Zi Ce _ £_2;) and Ni = Eijk (fj Le = fev;). 


592 18 Cartesian Tensors 
18.4.6 Quotient Theorem 


Sometimes it is necessary to clarify whether a set of functions, say, {a;(x;)}, 
forms the components of a vector or not. A direct method is to examine 
whether the functions satisfy a required transformation law under a rotation 
of axes, which is, however, troublesome in practice. In this subsection, we 
describe an alternative and more efficient method, called the quotient law, 
which is a simple indirect test for determining whether a given set of quantities 
forms the components of a tensor. 


@ Quotient theorem: 
If a;v; is a scalar for a vector v in any rotated coordinate system, then 
the a; constitute the components of a vector a. 


Proof Suppose that we are given a set of n quantities a; subject to the condi- 
tion that a;v; is a scalar for components v; of arbitrary vector v in terms of 
an arbitrarily rotated coordinate system. We may then write 


QjU5 = dQ, (18.41) 


in which ¢ denotes a scalar. Denoting the (as yet unknown) transform of a; 
by a’;, we know that in the z’-coordinate system the condition (18.41) reads 


a yu’; — g'. (18.42) 


Since ¢ is a scalar, ¢ = ¢’. Furthermore, since v; are components of a vector, 
it follows that 

v 5 = Rijv;- 
Accordingly, subtracting (18.42) from (18.41) gives 


On the left-hand side, a summation over 7 is implied, so, we cannot assert 
directly that the coefficients of v; vanish. However, since (18.43) should be 
valid for any coordinate system, we may specifically choose the coordinate 
system in which the components of v read v, = 1 and vqz1) = 0. Equation 
(18.43) then reduces to 

ay — Rad; = 0. 


Similarly, choosing an appropriately rotated coordinate system that provides 
the components v2 = 1 and vgz2) = 0, we infer that 


/ 
a2 — Ryyai= 0. 


Continuing in this manner, we find that 


18.4 Tensor Algebra 593 
a; = Ra’; for all 3. 
Multiplying both sides by Rz; yields 
Ryjaj = Rey Raja's = Opia’s = a'r, 
i.e. 
Oe Ragas 


which is the transformation law for components of a vector. We thus conclude 
that the a; constitute the components of a vector, denoted by a. & 


Remark. In applications of the above theorem, one must be certain that the 
coordinate system employed is arbitrarily rotated, and this hypothesis repre- 
sents a very strict condition that is not often satisfied. 


18.4.7 Quotient Theorem for Two-Subscripted Quantities 


As a second important case, assume a set of n? quantities ayz such that aijviv; 
is a scalar ¢ for a vector v and for any rotated coordinate system. Our task is 
to examine whether such two-subscripted quantities a;; constitute the com- 
ponents of a tensor of second order. We shall see, however, that the answer is 
negative. In fact, we can say nothing about the tensorian character of a,; from 
the hypothesis noted above, which implies the need to modify the quotient 
theorem for two-subscripted quantities. 

Developing the modified quotient theorem requires a discussion that par- 
allels that given in Sect. 18.4.5. By hypothesis, we can set 


AjjUjV; = o) 
in the given 2-coordinate system and similarly 
a’ kev’ pu'e = (18.44) 


in the 2’-coordinate system. In (18.44), we have denoted the as yet unknown 
transforms of a;; by a’ iy. Using the transformation law of v; as well as the 
fact that ¢ = ¢’ gives us 


(aij — ReiReja'ne) viv; = 0. (18.45) 


As a summation is implied over 7 and j, we cannot infer directly that 
the coefficients of u,v; vanish. Instead, we successively choose components 
(U1, V2, V3,---+) as (1,0,0,---) and (0,1,0,---), ete., to get 


dit — ReiRera'pe =0, 22 — ReoRe2a'ke = 0,--- « (18.46) 


These results imply that the terms a;; with 1 = j obey the transformation 
law of second-order tensors. Nevertheless, it tells us nothing about the terms 


594 18 Cartesian Tensors 


involving aj; with i 4 j. To further examine this point, we set components as 
v1 #0, ve £0, and v; = 0 for other i. Then, (18.45) becomes 


(air — Ret Rea’ ge) Vivi + (a12 — Rei Reza’ xe) vive 


/ / 
+ (a1 — RyoReva' pe) v2v1 + (d22 — Reo Re2a'xe) v2v2 = 0. 


Owing to (18.46), we find that the coefficients of v1v; and vev2 vanish. Fur- 
thermore, since 
Re Rea pe = ReoReia' cx 


is simply a relabeling of the indices k and @, we see that 
[(a12 + Gar) — (a’ ke + 0’ ex) Re2Rei] vive = 0. 
Thus, choosing v; = 1 and v2 = 1 gives us 
ai2 + 421 = (ane +0’) ReaRer. 
Again, this process may be repeated to yield 
aig + aj: = (ane + 0 en) Raj Rei, 


Le., 
/ / 
ape t+ ep = Rei Rei (ij + a1) - 


This is indeed the transformation law of a second-order tensor, but it refers to 
ay; +a,;, i.e., the symmetric part of 2a,;, and not to a;; as such. Accordingly, 
the quotient theorem for this case must be stated as follows. 


@ Quotient theorem for two-subscripted quantities: 

Suppose a set of n* quantities aj; to be such that for a vector v and for 
any rotated system, the sum a;;v;v; is a scalar. Then the symmetric parts 
(a;; + a;;)/2 of a,j; are the components of a second-order tensor. 


Remark. 


1. If in addition to the above hypothesis, we are given that the a,j; are sym- 
metric, then the a;; themselves are the components of a second-order 
tensor. 

2. Nothing can be inferred about the tensorial character of the anti-symmetric 
part of a;; from the above hypothesis that because part contributes noth- 
ing to the scalar ¢, as seen from 


(aij = 54 )ViV; = AjjUjVZ_ — APViV;] = AiZUIV_Z — AijZUJVi = 0, 


where in the last step the indices 7 and 7 are interchanged. 


18.4 Tensor Algebra 595 


Example Using the quotient theorem, we show that the two-subscripted quan- 


tities a,; given by 
Paice (x2)? 2&1 02 
[aiy] = Vee (x1)? 
are the components of a second-order tensor. Note first that a;; = aj; and that 
the outer product xpx¢ is a second-order tensor. Contracting the quantities 
aij With the outer product x,x¢, we obtain 


2 


AjyjLily = (a2)? (a1) — 11 %29X1X2Q — 11 UXQXQX1, + (21)? (a2)? => 0, (18.47) 


in which the last term, 0, is a zeroth-order tensor. Since (18.47) holds for any 
rotated coordinate system, we conclude that a;; is a second-order tensor. 


Exercises 


1. Derive the equation V x (V x v) = V(V-v) — V?v. 
Solution: Straightforward calculations yield 
Ov Ov 
Ox OX, (5:8) 1) 550m 
070; 070; 0 (= ) 070; 


= Ox 5 OX; Ox jOX; = Ox; Ox; Ox jOx; 


[V x (V Xx v)]i = €ajnEkim 


9 
=[V(V-v)—V?u];. & 


(V-v)—-V?u, = [V(V-v)]i— [V7 2]; 


2. Derive the expression (18.38) using the result (18.37). 
Solution: We consider the vector products (in the sense of elemen- 
tary vector calculus) of the transformed basis arrows e’; given by 
e’; x e’; = 
(Rivee) X (Rjmem) = RieRjmee X Cm. Forming the scalar prod- 
uct with e, yields 


(e'; x e’;) -e€n = RieRjm (eg x Em) “En, 
where on the right-hand side only two terms survive for each fixed 
value of n since 


+1 if (€,m,n) = (1, 
(€gX€m)-€n= 4-1 if (m,n) =, 
0 otherwise. 


(Here we assume that the coordinate systems associated with {e;} 
and {e’;} are both right-handed.) Hence, we have 


596 18 Cartesian Tensors 


(e’; x e’;) -eCn = RieRjim = RimRje, (18.48) 

where @,m,n is a cyclic permutation of 1, 2,3. Moreover, since 
e¢X 6, =e «= Rarer, (18.49) 
it follows from (18.48) and (18.49) that 
Rer€r °En = Rhrdrn — Ren = RieRjm _ RimRje, 
where again i,j,k and £,m,n are both cyclic permutations of 
1,2,3. If {ec} is left-handed, a similar procedure yields 
Rin = — (RieRjm — Rim Rye) - 
Substituting these results into (18.37), we finally arrive at the 
conclusion that 
wk = IR| Rin Wn; 


which is a transformation law for a pseudovector. & 


3. Show that the process of contraction of an Nth-order tensor produces 
another tensor of order N — 2. 
Solution: Let Tj;...1...m.... be the components of an Nth-order ten- 
sor; then 


/ — 
Te deaneek = 


RipRjq op Rir pe Rms aoe Ril pgecr ene 


Thus if, e.g., we make the two subscripts / and m equal and sum 
over all the values of these subscripts, we obtain 


a Bee Renck = RipRjq see Ri sh Rms cate iol patina saith 
— RipRjq see Ons aathee Rin Tpqeer-isn 


showing that Tjj...1...1...~. are the components of a (different) Carte- 
sian tensor of order N—2. & 


18.5 Applications in Physics and Engineering 


This section is devoted to illustrations of physical applications of second- and 
higher-order Cartesian tensors. We start with an example from mechanics and 
follow that by examples from electromagnetism and elasticity. 


18.5.1 Inertia Tensor 


Consider a collection of rigidly connected particles, wherein the ath particle 
has mass m( and is positioned at r‘© with respect to the origin O. Suppose 
that the rigid assembly is rotating about an axis through O with angular 
velocity w. The angular momentum J of the assembly is given by 


18.5 Applications in Physics and Engineering 597 
Je by (n) x pi) d 
a 


Here p™ = mr and 7 = w x r(™ for any a whose components are 
expressed in subscript form as 


(a) 


a) gta) and 4; 


Rese mi = Ekimuye), 


Thus we obtain 


Ji= = eae 7 NG =e Som (ei jnas enue 


a j,k he 


= = 2am (d:10jm — Simdj1) 2° Vly 
= ye S- mio (cay Oil = ra y= yy Twi, (18.50) 
a 1 l 


with the definition 
2. 
Ta = Sim | (e) Die oe (18.51) 


The set of quantities [;, forms a symmetric second-order Cartesian tensor; the 
symmetric property expressed by Jj, = Iy; follows readily from (18.51). The 
fact that the [;, form tensors can be proved by applying the quotient rule (see 
Sect. 18.4.6) to equation (18.50), wherein J; and u, are vectors. The tensor 
I, is called the inertia tensor of the assembly with respect to O. As evident 
from (18.51), I;; depends only on the distribution of mass in the assembly and 
not on the direction or magnitude of the angular velocity of the assembly, w 

If a continuous rigid body is considered, m‘® is replaced by the mass 
distribution p(r) and the summation )*,, by the integral of [ dV over the vol- 
ume of the whole body. When expanded in Cartesian coordinates, the inertia 
tensor of a continuous body would have the form 


{[(y?+2)pdV — f cypdV — f zapdV 
Sl —farypdV f(2?*+27)pdV — f yzpdV 
— f zxpdV —fyzpdV — f(a? +y?)pdV 


The diagonal elements of this tensor are called the moments of inertia 
and the off-diagonal elements without the negative signs are known as the 
products of inertia. 

It is possible to show that the kinetic energy K of the rotating system 
is given by kK = SLjw ju, which is a scalar obtained by twice contracting 


598 18 Cartesian Tensors 


the vector w; with the inertia tensor J;;. In fact, an argument parallel to that 
leading to (18.50) yields 


1 1 a 
kes yn) PO = 5 Simo) 2 Eijnwj rl estmunc®) 
a j,k,l,m 


This shows that the kinetic energy of the rotating body can be expressed as 
a scalar obtained by twice contracting the vector w; with the inertia tensor 
Ij. Alternatively, since J; = Ij;jw, the kinetic energy may be written as 


18.5.2 Tensors in Electromagnetism in Solids 


Magnetic susceptibility and electric conductivity are also examples of 
physical quantities represented by second-order tensors. For the former, we 
have the standard expression 


M; = So xg Hy, (18.52) 
J 


where M is the magnetic moment per unit volume and H is the magnetic 
field. Similarly, for the case of electric conductivity, we can write 


i= SS) oj Ey. (18.53) 
J 


Here, the current density j (current per unit perpendicular area) is related to 
the electric field E. In both cases, we have a vector on the left-hand side and 
the contraction of a second-order tensor with another vector on the right-hand 
side. 

For isotropic media, the vector M is parallel to H and, similarly, the 
vector j is parallel to E. Thus, the above tensors satisfy y;; = x6;; and 
Oij = dij, respectively, resulting in M = yH and j = cE. However, for 
anistropic materials such as crystals, the magnetic susceptibility and electric 
conductivity may be different along different crystal axes, thus making yj; 
and oj; general second-order tensors (usually symmetric). 


18.5.3 Electromagnetic Field Tensor 


All the tensors that we have considered in this chapter so far relate to the three 
dimensions of space and they are defined as having a certain transformation 
property under spatial rotations. In this subsection, we shall have the occasion 


18.5 Applications in Physics and Engineering 599 


to use a tensor in the four dimensions of relativistic space-time; the tensor is 
the electromagnetic field tensor F,. 

Recall that an electromagnetic field in free space is governed by the 
Maxwell equations, which take the form 


V-B=0, V-E=4rkip, 
ko OE OB 
V x B=4t4kJ+—-——, Vx E=—-k3_. 
ee iy Ot at 
Here E is the electric field intensity, B is the magnetic induction, p is the 
charge density, and J is the current density. There are several ways of defining 
the values of constants k; (¢ = 1, 2, 3); indeed, their values depends on which 
system of unit we use. Typical examples are listed in Table 18.1. 
The Maxwell equations take on a particularly simple and elegant form on 
introducing the electromagnetic field tensor F),,, defined as 


Fy, = 0)A, — 0, Ap. (18.54) 


Here, A,, = (¢/c,—A) is called a four potential, determined by the scalar 
potential ¢ and the vector potential A that generate the fields B = V x A and 
E =—V¢-0A/0dt. The symbol 0,, in (18.54) denotes the partial derivatives 
with respect to the pth coordinate. Straightforward calculations yield 


0 Eo bese hfe 
j= | Ee 9 -B Oe 
Bel | Be BSG) SBE | 

-B3/c —B? Bo 


where E = (E!, E?, E°) and B = (B', B?, B®). We also introduce another 
relevant tensor defined by 


(18.55) 


0 E!/¢ —E?/¢ —E*/c 
E"/c 0 —B? B? 
py 
[F ] = E? /e B3 0 —Bl ’ (18.56) 
F3 /e —B? B} 0 


in which yw and vy are superscripts in opposed to (18.55), where they are 
subscripts. As a result, we can see that the Maxwell equations are equivalent 


Table 18.1. Values of the constants k;(i = 1, 2,3) in the Maxwell equations. jo, €o 
and c are the permeability, permittivity, and speed of light in vacuum, respectively 


System of Unit ki ko k3 
MKSA 1/(4me0) po/(47) 1 
CGS-esu 1 1/e 1 
CGS-emu om 1 1 
CGS-Gauss 1 1/e 1/c 


600 18 Cartesian Tensors 
to the following two field equations: 
dL OFM” = pj", 
Vv 
Oo Fv + On Fue + OF op = 9, 


where j" = (pc, J) is the four-current density. 


Remark. The distinction between superscripts and subscripts on the symbol 
F shown in (18.55) and (18.56), respectively, is clarified in Chap. 19, which 
deals with non-Cartesian tensor calculus. 


18.5.4 Elastic Tensor 


Thus so far, we have focused on the physical applications of second-order 
tensors, which relate two vectors. Now, we extend this idea to a situation 
where a fourth-order tensor relates two physical second-order tensors. Such 
relationships commonly occur in elasticity theory. In the framework of this 
theory, the local deformation of an elastic body at any interior point P is 
described by a second-order symmetric tensor e,;; called the strain tensor, 


which is given by 
1 Ou; Ou; 
i 2 (= - a | ‘ 


where u is the displacement vector describing the strain of a small volume 
element. Similarly, we can describe the stress in the body at P by a second- 
order symmetric stress tensor p,;; the quantity p;; is the x;-component of 
the stress vector acting across a plane through P, whose normal lies in the x;- 
direction. A generalization of Hooke’s law that relates the stress and strain 
tensors is 


Pig = S > cigntent, (18.57) 
kl 


where ¢;;,) is a fourth-order Cartesian tensor. 
Specifically, for an isotropic medium, we must have an isotropic tensor for 
Cijkl; the most general fourth-order isotropic tensor is 


Cigkt = AOiZOKI + 0iKdj1 + VOI jR- 
Substituting this into (18.57) yields 
Pig = rOig LS Ckk 1 Nig + VEZ. (18.58) 
k 


Note that e;; is symmetric. Hence, if we write 7+ v = 2yu, (18.58) takes the 
conventional form 
Dij = XS Crk Oig + 2Mes;, 
k 
in which \ and yu are known as Lamé constants. 


19 


Non-Cartesian Tensors 


Abstract Having discussed tensor theory based on Cartesian coordinates, we now 
move on to its counterpart, i.e., tensors described by curvilinear coordinate systems. 
The use of a curvilinear coordinate system endows the tensor calculus with the 
properties of “covariance” (Sect. 19.1.3) and “contravariance” (Sect. 19.1.4), both 
of which are new concepts originating from the nonorthogonality of the coordinate 
axes. 


19.1 Curvilinear Coordinate Systems 


19.1.1 Local Basis Vectors 


We have thus far restricted our attention to the study of Cartesian ten- 
sors, where, from a practical stand point, only rigid rotations of axes (proper 
and/or improper) are taken into account as coordinate tranformations. How- 
ever, we must free ourselves from this restriction and develop the tensor calcu- 
lus in terms of curvilinear coordinate systems, In advanced mathematical 
physics, we often have to deal with tensor analysis on curved surfaces (or more 
abstract manifolds) on which orthonormal coordinate systems cannot be de- 
fined, and in such cases the theory developed thus far is entirely inadequate. 
This means that we have to formulate tensors and their transformations in 
terms of general curvilinear coordinate systems. 

To begin with, we review some properties of general curvilinear coordi- 
nates. Suppose that the position of an arbitrary point P in a three-dimensional 
space has Cartesian coordinates x, y, z. In general, this position may be ex- 
pressed in terms of three curvilinear coordinates uj, uz, u3, which are functions 
of x,y,z as explicitly represented by 


us = U1(2,Y, 2), 
Ug = U2(z,y, 2), 


U3 = U3(2,Y, 2). 


602 19 Non-Cartesian Tensors 


We denote by r the position arrow connecting the origin O and the point P. 
Obviously, the direction and magnitude of the arrow depend on the coordi- 
nates of P, which are symbolized by 


r= r (ur, U2, U3). 


We now consider the partial derivative of r with respect to uj, ie., 


Or 


Fa (19.1) 


ea, = 
From the definition, the vectors e; are directed along the corresponding coor- 
dinate lines at the point P. As a result, an infinitesimal vector displacement 
dr in curvilinear coordinates is given by 


dr = a = e;duj;, 
where the summation convention is employed. The vectors e; are referred to 
as local basis vectors. (In precise terminology, they are called covariant 
local basis vectors, as explained later.) 

It is obvious from (19.1) that the vectors e; are functions of the curvilinear 
coordinates u;, namely, e; = e;(u1,u2,u3). This implies that the directions 
and magnitudes of the e; vary from point to point in the space considered, 
which is in contrast to the case of a Cartesian coordinate system, where the 
basis vectors are spatially independent. Spatial dependence of basis vectors 
is actually one of the most important properties of curvilinear coordinate 
systems. 

Another notable property of curvilinear coordinate systems is the fact that 
they allow us to define another useful set of three vectors at P as 


ei = Vuj- 


Clearly the direction of €; is normal to the surface u; = const; thus being 
different from the directions of any vectors e; (¢ = 1,2,3) in general (see 
Fig. 19.1). Therefore, at each point P in a curvilinear coordinate system, 
there exist two sets of basis vectors defined by 


u,=const 


Fig. 19.1. (a) Spatial dependence of e; in the curvilinear coordinate system. (b) 
Difference between e; and €; 


19.1 Curvilinear Coordinate Systems 603 


Or 

In the tensor analysis, literature of the set of vectors €; introduced above is 
denoted by e’, the index being placed as a superscript to distinguish it from the 
first set of vectors e;. Relating to the notation above, we introduce a modified 
summation convention as follows: if we find a lower-case alphabetic index 
that appears twice, once as a subscript and once as a superscript, we sum 
over all the values that the index can take. In this convention, the curvilinear 
coordinates are denoted by u!,u?,u, with the index raised (see the remark 
in Sect. 19.1.3), to arrive at the following definition. 


and = VU. (19.2) 


ej 


@ Local basis vectors: 
A curvilinear coordinate system is characterized by two sets of three 
vectors {e;} and {e’} defined by 


Or 


= Out and e! = Vu. 


ej 
Here, the e; are referred to as the covariant local basis vectors, and the 
e’ as the contravariant local basis vectors. 


The prefix “local” emphasizes the fact that the lengths and orientations of 
these basis vectors vary from point to point in the space; this fact is explicitly 
represented by 


e; =e;(u',u?,u) and e? =e? (u',u?,u"). 


For the sake of conciseness, we omit the prefix in the subsequent discussions 
and use the terms contravariant (or covariant) basis vectors, bearing the 
locality in mind. 


Remark. 


1. In common practice indices that represent contravariant character are 
placed as superscripts and those indicating covariant character as sub- 
scripts. 


2. For Cartesian coordinate systems, the two sets of basis vectors e; and e? 
are identical and, hence, there is no need to differentiate between con- 
travariance and covariance. 

3. In derivatives such as Or/Ou’, the i is considered as a subscript. 


19.1.2 Reciprocity Relations 


Generally the covariant basis vectors e1, €2, and e3 are neither of unit length 
nor are they orthogonal to each other; this is also true for the contravariant 


604 19 Non-Cartesian Tensors 
basis vectors, e', e?, and e?. Nevertheless, the sets {e;} and {e7} still have 


an important property as stated below. 


@ Reciprocity relations: 
The sets of contravariant and covariant local basis vectors {e;} and 


{e,;} satisfy the reciprocity relations such that 
e,-e) = 6), (19.3) 


where the scalar product of the vectors is taken in the sense of elementary 
vector calculus. 


Proof By using Cartesian representation, we have 


. Or Ox Oy Oz Ou? Ou? Ou? 
ei = : JI = : 
ec ug & Ou’ _ ( Ox’ Oy? Oz ) 


Ox Jul Oy Oud Oz Oui 
~ Out Ox Out Oy ~~ Out Oz 


Remark. The reciprocity relation (19.3) implies that each covariant (or con- 
travariant) basis vector e; (or e’) is perpendicular to all contravariant (or 
covariant) basis vectors e, (or e*) except k = i. For instance, e; is perpen- 
dicular to e? and e°, but not to e! in general. To be precise, the vectors e1 
and e! make an angle @ that satisfies 


€,-e! = |e,| |e'| cos@ = 1, 


where |e;| 4 1 and |e*| 4 1. 


19.1.3 Transformation Law of Covariant Basis Vectors 


We are now in a position to discuss the concept of general transformations 
; 1 92.73 ; 
from one coordinate system, u!,u?,u°, to another, u’,u’~,u’”. A coordinate 


transformation is described by using the three equations 


ui = uw (u, u?, u), (19.4) 


19.1 Curvilinear Coordinate Systems 605 


for i = 1,2,3, in which the new coordinates ul’ can be arbitrary functions of 
the old ones u’. We assume that the transformation can be inverted, so that 
we can write the old coordinates in terms of the new ones as 


We now formulate the transformation law of basis vectors. The two sets 
of basis vectors in the new coordinate system are given by 
Or ; j 
e, = —— and e” = Vu". (19.5) 


ame Tae 
Using the chain rule, we find that the first set of basis vectors yields 


Or Ou! Oud 
e’; = du dul = Ay (19.6) 
This describes the transformation behavior of the local covariant basis vectors 
from the unprimed one e; to the primed one e’; under the coordinate trans- 
formation (19.4). Note that the partial derivatives as well as the basis vectors 
in (19.6) vary from point to point. Hence, relation (19.6) is valid under the 
condition that all terms involved are evaluated at the same point P in the 
space being considered. 
In the same manner, it follows that 


Or Or du aul”, 
Oui Aul® OuFk Auk 


ek = 
We thus have proved the following theorem: 


@ Transformation law of covariant basis vectors: 
The sets of local covariant basis vectors {e;(u/)} and {e’,(u’ an asso- 
ciated with two different curvilinear coordinate systems are related at a 
point P by 
Oud dul 
if / 
e;,= ae and e, = epee fi 


in which the partial derivatives are to be evaluated at P. 


(19.7) 


Remark. Observe that in all the mathematical expressions above (and be- 
low), the summation convention is applied to the indices that are repeated 
in one term as both a subscript and a superscript. Indeed, it was to satisfy 
this summation convention that the coordinates were written as u’ rather 
than uj. 


606 19 Non-Cartesian Tensors 
19.1.4 Transformation Law of Contravariant Basis Vectors 


Next we consider the transformation law of the contravariant basis vectors 
e’ = Vu". Recall that in terms of a rectangular Cartesian coordinate system, 
the operator V is expressed as 

O O 4) 


a 


V=1 


where 2,7,k are mutually orthogonal basis vectors of unit length. It then, 
follows that 


in which the first partial derivative reads 
dul* _ Out dul® 
Ox = Ox Out 


and other derivatives are written in the same way. Hence, we have 


+3 +k Au NO a ae 


et ; du’ Out du’ \ dul® < dul® — dul® 
Ox Oy Oz 


Similarly, we have 


: ; Vl ve rl j 
e? =Vu! = (3 ee aes = le ee 
xv Oy 


0 Oz 


These results are summarized as follows: 


@ Transformation law of contravariant basis vectors: 
The two sets of local contravariant basis vectors {e'(u/)} and {e"*(u'")} 

are related at a point P by 

a L du! el 


k 
yk Ou’ 
= -e’ and e = -e, 
Out Ou! 


(19.8) 
where the partial derivatives are again to be evaluated at P. 
It should be emphasized again that, owing to the summation convention, the 


repeated indices in (19.8) appear once as a superscript and once as a subscript 


19.1.5 Components of a Vector 


Given the two bases e; and e’, we may express a general geometric arrow a 
(i.e., a vector a) equally well in terms of either basis as follows: 


19.1 Curvilinear Coordinate Systems 607 


a = ae; + a7e. + a®e3 = a'e;, 


a= a et hr age? ale agze° = aye’. 

The a’ are called the contravariant components of the vector a and the a; 
the covariant components. Both kinds of components a’ and a; describe 
the same vector a, but they are associated with different basis vectors e; and 
e/, respectively. In plain words, a vector assigned at a point in a curvilinear co- 
ordinate system has two different expressions; say, (a+, a?, a?) and (a1, a2, a3) 
for the same vector a. The tensorian characters of the two kinds of compo- 
nents are inherently different from each other, as we shall see in subsequent 
discussions. 

For any vector a, the two kinds of components a’ and a; are readily ob- 
tained by forming the scalar products, 


a-e =a’e;-e =a)d, =a’ 
and 
ae; =a e! +e; = 076! = aj, 


where we have used the reciprocity relation (19.3). Furthermore, using the 
transformation law of e; given in (19.7) gives us 


yu 
Ou ' 
Oui 


This provides the transformation law of the contravariant components of a 
vector such that 


isk J 
ae; =a 


(19.9) 


a” = ——ai, (19.10) 


This relation is, in fact, the defining property for a set of quantities a’ to 
form the contravariant components of a vector. The formal statement is given 
below. 


@ Contravariant component of a vector: 
Quantities a; associated with a point P are said to be the contravariant 
components of a vector if these, quantities transform through the equation 


4 du" . 
—_— Ryle? (19.11) 


where the partial derivatives are evaluated at P. 


Remark. It might occur that a given ordered set of quantities a* associated 


with a point P has nothing to do with a vector; only those sets satisfying the 
transformation law (19.11) serve as (contravariant) components of a vector. 


608 19 Non-Cartesian Tensors 


Analogously to the case of (19.9), it follows from the identity for an arbi- 
trary vector a, 
Ou ri 
a; ——Te 
Ou" 


that the transformation law of covariant components yields 


rot j 
a=a';e a,e? 


9 


Oud 
— 5 4;. 
Ou" 
Again we take this result as the defining property of the covariant components 
of a vector. 


a; = 


(19.12) 


@ Covariant components of a vector: 
Quantities a; associated with a point P are said to be the covariant 
components of a vector if those quantities transform through the equation 


e 
i Ou 


a’ k (19.13) 


preg iC 
Aul® by 


where the partial derivatives are evaluated at P. 


Remark. Other textbooks may use the expression “contravariant (or co- 
variant) vector,” which is a distinctly different concept from a vector a 
or its components a’ (or a;) that we have just defined. Say, rather, that a 
contravariant vector is a collection of ordered triples, 


dP 22. 23 nl 2 78 mi 2 3 
{ (a',a?,0°), (a ,a 4a > (a ,a@ ,a eae (s) 


in which all the ordered triples consist of contravariant components of the 
same vector @ associated with different coordinate systems. We should make 
sure that a contravariant (or covariant) vector is not expressed by a geometric 
arrow as is done for a vector. 


19.1.6 Components of a Tensor 


We now define geometric objects of the contravariant class, which are more 
complicated in character than vectors and begin with the following: 


@ Contravariant component of a tensor: 

Index quantities 7}; associated with a point P are said to be contravari- 
ant components of a tensor if these quantities transform according to the 
equation 


19.1 Curvilinear Coordinate Systems 609 


i a rk 
pmo Ou 


jp Ep 
Ou! Ou™ 


(19.14) 


There is no difficulty in defining covariant tensors of higher orders. For a 
tensor of second order, e.g., we have the definition below. 


@ Covariant components of a tensor: 
Index quantities Tj, are said to be covariant components of a second- 
order tensor if these quantities transform according to the equation 


du! du™ 


ae (19.15) 


eed pen 


We shall see later that there are many examples of tensors of this kind in 
physics and engineering. The moment of inertia, the stress of elasticity, and the 
electromagnetic field are cases in point; if their components in terms of certain 
coordinate systems are evaluated, they all turn out to obey the transformation 
law (19.14). 

In terminology, all quantities satisfying (19.11), (19.13) and (19.14), (19.15) 
are called components of a first-order tensor and components of a 
second-order tensor, respectively; the order goes as the number of in- 
dices attached. The definitions of tensors of higher orders are given through a 
straightforward generalization of the above. Conversely, we can define a ten- 
sor of zero order, called a scalar, that involves no index so that its single 
component (i.e., the scalar itself) is constant under any coordinate transfor- 
mation; namely, 

LT =f: 


Such a quantity is called an invariant. 


Remark. For any components of tensors, the number of indices is independent 
of the number of dimensions of the space considered. The definitions above 
for vectors, tensors, and scalars are all valid for an arbitrary n-dimensional 
space. 


19.1.7 Mixed Components of a Tensor 


Having defined contravariant and covariant components of a tensor, we can 
now define another class of components, called mixed components of a 
tensor, that involve the two character simulteneously. 


610 19 Non-Cartesian Tensors 


@ Mixed components of a tensor: 
Index quantities T’,,, are said to be the mixed components of a tensor 
of the third order if these quantities transform according to the equation 


: ul’ du™ Ou” 
i 2 

Te ik = dS ray Boas treeny a 
Ou Ou! Ou 


Clearly, T ‘ , transforms contravariantly with respect to the first index 7 but 
covariantly with respect to the other indices j and k. 

If we consider the components of higher-order tensors in non-Cartesian 
coordinates, there are even more possibilities. As an example, let us consider 
a second-order tensor JT’. Using the outer product notation, we may write T 
in three different ways: 


T= Te; ® €; = Te ® e; = T,je' ® e’, 


where T”, T;, and Tj; are called the contravariant, mixed, and covari- 
ant components of T’, respectively. It is important to remember that these 
three sets of quantities form the components of the same tensor T but refer 
to different tensor bases made up from the basis vectors of the coordinate 
system. Again, if we use Cartesian coordinates, the three sets of components 
are identical. 

We may generalize the above equation to higher-order components. An 
object Te 5 is called a component of type (n,m) in which the integers n 
and m represent the numbers of superscripts and subscripts, respectively. 
By definition, components carrying only superscripts (i.e., m = 0) or those 
carrying only subscripts (i.e., n = 0) are referred to as the contravariant and 
covariant components, respectively; all others are called mixed components. 


Remark. The order of indices needs caution. For instance, we shall see later 
that in general 
MY, #T;’. 


Nevertheless, we can write fo with no clarification of the order of 7 and 7 if 
no ambiguity occurs or the order of indices is irrelevant. 


19.1.8 Kronecker Delta 


The Kronecker delta is a special kind of a second-order tensor that has 
mixed components given by 6, and is defined as follows: 


19.2 Metric Tensor 611 
As these are mixed components of a tensor, they transform as 


1, ift=j, 
0, if Fj, 


i~ @Oue Syl ™ ~ Oue Bu!) _ dul 


ie dul’ Au™ » — Ou Aue — Au!” _ { 


J are independent coordinates. 


since in the last partial derivative, u’’ and u’ 
Thus, we obtain the result 


C=, (19.16) 


which means that the tensor consisting of 6 has the same components in 
all coordinate systems. This is why the tensor consisting of oy is called the 
fundamental mixed tensor. 


Remark. The components 6” (or 6;;) are of no special importance, since they 
do not satisfy the invariance condition (19.16), which means that their values 
change when we use other coordinate systems. A exception is the case of 
rectangular coordinate systems, where the contravariant and covariant tensors 
become identical, so that we have 67 = 05 = 6j;. 


19.2 Metric Tensor 


19.2.1 Definition 


We now introduce important quantities that describe the geometric character 
of the space arithmetized by a certain curvilinear coordinate system. We know 
that the scalar product of a vector a and local basis vectors e; and e? yields 


aj =a-e;=a'(e;-e;) and a? =a-e! =a; (e’-e’). (19.17) 
Now we introduce the following notation: 
Ci Cj = Gig = Gji 
and 
e§ ef = gid = git. 
We can then write (19.17) in the form 
aj = gina” and a! = g)*a,. 


These equations express the covariant components of the vector a in terms 
of its contravariant components, and vice versa. We shall see that the nine 
quantities g;, form a second-order tensor called a metric tensor. 


612 19 Non-Cartesian Tensors 


@ Metric tensor: 
Two-index quantities defined by 


Ci —e eo eg ee (19.18) 
serve as covariant and contravariant components of a second-order tensor 
called a metric tensor. 


The proof of the tensor character for the above is given in Exercise 1. 


Remark. 


1. Since both e; and e? are functions of the coordinates, so are the quantities 
gj and g”. 
2. The mixed components g; of the metric tensor are identical to those of 
6', since, by definition, we have 
i 


=e.e,=6' 
g; =e’ ej = 65. 


Examples We calculate the elements g;; for cylindrical coordinates, where 
(ul, u?,u3) = (p, 6, z) and p and ¢ are related to Cartesian coordinates x and 
y as x = pcos@¢ and y = psing. Hence, the position vector r of any point 
may be written as 


Tr = pcos ¢@t+ psin dj + zk, 


where 2,7, are orthogonal basis vectors. By definition, we have 


0 
e; = — = cos gi +sindj, (19.19) 
Op 
Or ae , 
e; = a6 = —psin di + pcos dj, (19.20) 
pga ae. (19.21) 
Oz 
Thus the components of the metric tensor [g;;] = [e; - e;] are found to be 
0 
[gis] = | 0 pe? 


19.2.2 Geometric Role of Metric Tensors 


The quantities g;, (or g’”) describe the fundamental geometric character of 
a space arithmetrized by a certain u’-coordinate system with a basis {e;}. 


19.2 Metric Tensor 613 


A geometric role for g;; was implied in the definition (19.18), where g;; equals 
the scalar product of the two covariant local basis vectors e; and e;. Hence, 
gi; determines the angles of local basis vectors e; and e; at each point and 
thus describes the coordinate(u*)-dependence of the vectors e; = e;(u*) and 
e; = e;(u*) that span the space being considered. This implies the possibility 
that the metric tensor g rather than the basis vectors can be regarded as a 
more fundamental object determining the geometric nature of the space in 
question. Indeed, we can establish the framework of tensor calculus based on 
a knowledge of the spatial dependence of the metric tensor g without any 
information about the local basis vectors. This point is dealt with in §20.3.5 
Sect. 20.3.5. 

The role of g;; in determining the geometric nature of the space also follows 
from another stand point as shown below. Let ds be the arc length between 
two infinitely close points. We denote by dr the vector joining the two points, 
whose covariant components are du; and contravariant components du’. Then, 
since dr = e;du’ = e*dux, we have 


(ds)? = |dr|? = dr - dr 


= e,du' - e,du* = e;du' - e* du, = edu; - e*dux, 


or 
(ds)? = giz.du'du*, (19.22) 
(ds)? = g'*dujdux, (19.23) 
(ds)? = dujdu'. (19.24) 


Since (ds)? is a scalar, all of the quantities on the right-hand sides are also 
scalars. It should also be noted that in (19.22) and (19.23), the du* (or dug) 
are contravariant (or covariant) components of a vector. Hence, in view of the 
quotient theorem regarding two-index quantities (see Sect 18.4.7), it turns out 
that the symmetric quantities gi, (or g’”) form covariant (or contravariant) 
components of a second-order tensor. 


19.2.3 Riemann Space and Metric Tensor 


We have seen that in terms of tensor calculus, the metric tensor g rather than 
the local basis vectors e; and e? is a more fundamental object in determining 
geometric properties of the space being considered. In fact, an abstract space 
of points to which we assign a certain class of a second-order tensor g at each 
point is referred to by a special name as stated below, which gives a formal 
definition of the metric tensor g in the language of tensor calculus. 


614 19 Non-Cartesian Tensors 


@ Riemann space: 

A finite-dimensional space of points labeled by an ordered set of real 
coordinates u',u?,--- ,w” is called a Riemann space if it is possible to 
define two-index quantities g;; that possess the following properties: 

1. Each entity g;;(u',u*,---,u”) is a real single-valued function of the 


coordinates and has continuous partial derivatives. 

2G ie SO ee | 

3. g = det|gix| 4 0. 

The tensor g formed by the two-index quantities g;; noted above is called 
a metric tensor of the space. 


Remark. Note that the above definition of a metric tensor is free of the concept 
of local basis vectors. 


In this context, the superscripted components g’’ are defined by 


9 onj = 65 or g* = ; 


where C**(= C**) is the cofactor of giz in the determinant g = det([gix]. (See 
Exercise 2 for the proof of the above.) 

Our familiar Euclidean space is a particular class of Riemann space as 
stated below. 


@ Flat Riemann space: 


A Riemann space is flat if and only if it admits a system of rectangular 


Cartesian coordinates x!,x?,--- ,2” such that at every point of the space, 


(ds)? = ey (dx')” + €5 (dx”)” +--+ + €n (da”)?, (19.25) 
where each ¢; equals either +1 or —1. 


@ Euclidean space: 
A Euclidean space is a flat Riemann space for which all ¢; in (19.25) 
are equal to +1. 


19.2.4 Elements of Arc, Area, and Volume 


Below we describe several useful relations in connection with the elements of 
arc length, areas, and volumes in terms of metric tensors. 


19.2 Metric Tensor 615 


1. Element of arc length: 
The element of arc length ds; along a particular coordinate curve u’ with 
fixed 7 is 


ds; = |dr| = |e;|du’ = \/e;- e;du' = \/g;jdu' (no summation over i). 


2. Element of area 
The element of area doy in the coordinate surface ut = const; for instance, 
reads 


do, = \drg x dr3| = |eo x e3| du?du® 


= V(e2 x e3) (€2 x e3) du7du® 


= V(e2 : €2) (€3 e3) = (€2 : e3) (€9 : e3) du*du? 


= \/ 922933 — (go3)° du?du’. 


Similarly, we have 
doz = \/ 933911 — (g13)” dudu', 
doz = \/ 911922 — (g12)" du'du?, 


which are summarized by 


do; = \/ 953 9kk — (gjr) du’du* (no summation over j and k), 


where 2, 7,k is a cyclic permutation of the numbers 1, 2,3. 
3. Element of volume 
Finally, we can derive the equation for the element of volume as 


dV = \(dry x dr) z dr3| = \(ey x €2) ‘ e3| du'du?du3 
a Jgdu' du? du®, 


where g = det|gix]. [Proof of the identity (e, x e2)-e3 = g is given in 
Exercise 2.] 


Our results are summarized as: 
@ Theorem: 


Elements of arc length ds;, area do;, and volume dV, respectively, are 
represented in terms of curvilinear coordinate systems by 


ds; = /gidu' (no sum over i), 


616 19 Non-Cartesian Tensors 


do; = / 953 9%n — (gyn) du! du® (no sum over j and k), and 
dV = Vg du'du7du?, 


where 2,7, k is a cyclic permutation of 1, 2,3. 


19.2.5 Scale Factors 


In this subsection, we consider the case of orthogonal coordinate systems, for 
which the basic descriptive quantities are the scale factors (or the metric 
coefficients) h,, he, h3, defined by 


hy = Jou, he = /922, h3 = /933- 

Obviously, they satisfy the equation 

(ds)? = (hydu!)* + (hodu?)” + (hgdu?)”. 
Furthermore, since g;; = 0 for i 4 7, we have 

ds; = hjdu' (no sum over 4), 

do; = hjhydud du* (no sum over j and k), 

dV = hyhoh3du'du7du’, 
where 2,7, k is a cyclic permutation of 1, 2,3. 
Examples 1. In rectangular Cartesian coordinates, 

(ds)? = (dx)? + (dy)? + (dz)’, 


so 


2. In cylindrical coordinates, 
(ds)* = (dR)? + (Rd0)? + (dz)?, 


SO 
Bet. to Sk eee 


3. In spherical coordinates, 
(ds)? = (dR)? + (Rd0)? + (Rsin 6d¢)’, 


so 
hy = 1, ho =R, h3 = Rsin@. 


19.2 Metric Tensor 617 
19.2.6 Representation of Basis Vectors in Derivatives 


It is often desirable to represent local covariant basis vectors e; as well as 
components of metric tensors gi; = e;-e,; at a point r in terms of derivatives 
of r with respect to coordinates the u’. 

Suppose the relation between a system of curvilinear coordinates u!, u?, u 
and an underlying system of rectangular coordinates 11, 22,73( = 2,y,2Z) is 
given by 


3 


u'=u'(a,) and a, = 2,(u"), (19.26) 
where the Jacobian 
ie Ou" 
— Ox; 


is neither zero nor infinite. Writing the latter equation in (19.26) more con- 
cisely as 
r=r(w'), 


where r = Xztx is the position arrow of an arbitrary point, we find 


Or ,,; 
aoe Out 
It then follows that 
(ds)? = dr-dr = or . or uid, 
which implies that the vectors of the local basis are 
eel 
“Aut 


and the metric tensor is 


yes Gee Or  OZy. BEG, 2! Oth Ohi ye OU Ome 
34 Out Gul Out” But? ~ Buk Oud VP! — Buk Out 


This leads to the following expression for the scale factors (for the case of 
orthogonal coordinate systems): 


_ Ox 0% Ox 2 0x3 
nay (2) +(%) Tae) 


19.2.7 Index Lowering and Raising 


In curvilinear coordinate systems, it is possible to express a scalar product 
of two vectors via several different subscript forms. For instance, the scalar 
product of two vectors a and b may be written using their contravariant or 
covariant components: 


618 19 Non-Cartesian Tensors 


a-b= a’e; : be; = gija' b? (19.27) 
and - 

a-b= a,e’ : be? = g") axb;. (19.28) 
Furthermore, we may express the scalar product in terms of the contravariant 
components of one vector and the covariant components of the other: 


a-b= aye’ 7 be; = a; 6, = a,b" (19.29) 

and aoe 
a:-b= a‘e; : b,e? = a'b; 6! = a‘b;. (19.30) 
By comparing the four alternative expressions (19.27)—(19.30) for a - b, 


we can deduce the following useful property of g;; and g’?. From (19.27) and 
(19.30) we see that the identity 


gija'b? => a’; 


holds for any arbitrary a’. Hence, we have 


gash! = di. (19.31) 
which illustrates the fact that the covariant components g;; can be used to 
lower an index b/. In other words, it provides a means of obtaining the co- 
variant components b; of a vector from its contravariant components b’. By a 
similar argument, we have 7 

gb; = B', 
where the contravariant components g’’ are used to raise the index j attached 
to b;. 


@ Index lowering and raising (I): 
For any vector a, its components a’ and a; are related via the compo- 
nents of the metric tensor as 


on gina” aindl of = gag. 


The above discussion regarding vectors can be extended to tensors of arbi- 
trary rank. For example, the contraction with g;; results in a lowering of the 
corresponding index: 


Tij = gin T'S j= 95nl* 2 (19.32) 


Here the dots (e) in the mixed components emphasize the order of occurrence 
of the indices; in fact, in general, T,*'! A T', ;. Repeated contraction with gj; 
yields 

Ti = giungaT™. 


19.2 Metric Tensor 619 

Similarly, contraction with g’! raises an index, i.e., 
Te So ge 19.33 
=G 44° =9 GG tki- (19.33) 


Comparable arguments are applicable to local basis vectors e; and e/ as 
stated below. 


@ Index lowering and raising (II): 
Local basis vectors e; and e* are related as 


ej = gine” and e! = g ep. 


Proof Since a = a'e; = aje! = a g,;e7, we have 
at (er — mje’) +a? (eo — g2je’) +a (e3 _ 93je) =0, 
which holds for any vector a. Hence, e, — 9nje* = 0 for all k, i-e., 
Ck = Gnje?. 


Similarly, we have 
e* = g” ej. & 


Exercises 


1. Show that the quantities 9;; = e;-e; form the covariant components of a 
second-order tensor. 
Solution: In the new (primed) coordinate system we have 
9; =ei-e'j. Using the transformation law (19.7) of covariant 
basis vectors, we have 


F Ou* du! Ou® Ou! Ou® du! 
G5 = ac el (ex - e1) 


——e = = ~ = — 2 GkI- 
Ou" Ou! Ou" Ou? Ou" Ou"? 


This clearly indicates that the gj; are covariant components of a 
second-order tensor (i.e., the metric tensor g). A similar argument 
shows that the quantities g’? form the contravariant components 
of g, which transform as follows: 


mag 


— uk oro: * 


620 19 Non-Cartesian Tensors 


2. Show that the matrix [g’’] is the inverse of the matrix [g;;]. 


For an arbitrary vector @, we find a’ = ‘Iq, = g pa. Since a is 
’ g J g J 
arbitrary, we must have 


ij li=k 
ta. — /* - ’ 
I? Gjr =O, = ‘i iZk (19.34) 


This clearly indicates that the matrices [g;;] and [g’’] are inverse to each 
other. de 
3. Show that \/g = (e; x ej) +e, and 1/\/g = (e’ x e?)- e*, where i, j,k is a 
cyclic permutation of the numbers 1, 2,3. 
Solution: By direct calculations, we obtain 
ie i ne (e; X ex) - (Em X en) 


{cE eereniieenea or 


where 7, j,k and @,m,n are cyclic permutations of the ordered set 
of numbers 1,2,3. The numerator in (19.35) reads 


Ce; eR)? (One Men) = [lex xee) xX @p| “ey 
a [(e, -€m) Ck — (€x + em) e;| En 
= (€; - em) (€x * Cn) — (ex - Em) (€; + En) 


ej “Em Ck Cm Jim Gkm 


Jin Gkn 


= a —cC% 


€j°Cn Ck* en 


Here, C” is the cofactor of gig in the determinant g = det(gj¢]. 
Comparing the results with the definition g’’ = C’/g, we find 
that 


g = (ex: (e; x ex)] [ee - (Em X €n)], 


which is equivalent to 


g= ei (ej; x ex), ie, VG=+ei- (ej x ex), 


where the plus sign is chosen if the given basis is right-handed. 
In a similar manner, via the relations gie = Cie and det [6*] = 
det[gi;97*] = det[g;;|det[g?"] = 1, we obtain 


= te" + (2 x e*). & 


19.3 Christoffel Symbols 621 


19.3 Christoffel Symbols 


19.3.1 Derivatives of Basis Vectors 


Several new concepts are required for the differentiation of vectors or tensors 
with respect to curvilinear coordinates. Recall that in a general curvilinear 
coordinate system, the basis vectors e; and e’ are functions of the coordinates. 
This implies that differentiation of vectors (say, v = v’e;) or tensors (say, 
T = T/e' @ e;) involves their derivatives, such as Je; /Ou’. 

Suppose that the derivative 0e;/Ou) can be written as a linear combination 
of the basis vectors e, as denoted by 


=e, (19.36) 


the symbol I f being the coefficients associated with the kth component of 
the linear combination. Using the reciprocity relation e’ - e; = 6}, we write 
this as 


Baek. SS. (19.37) 


This three-index symbol is called a Christoffel symbol. In a similar manner 
as above, we can show that the derivative of the contravariant basis vectors 
reads aie 

e i ok 
Details of the derivation are given in Exercise 1. 

We shall see that Christoffel symbols play a key role in defining the deriva- 
tives of vectors and tensors in terms of general coordinate systems. A more 
formal definition of Christoffel symbols in terms of metric tensors is given in 
Sect. 19.3.4. 


Remark. It is clear from (19.37) that in Cartesian coordinate systems, I; = 0 
for all values of the indices i, 7, and k, owing to the identity: 0e;/Ou! = 0. 


Example 4. Let us calculate the Christoffel symbols 177? for cylindrical coor- 
dinates, where (u+, u?,u®) = (p,¢, 2), and the position vector r of any point 
may be written 


r = pcos dt + psingj + zk. 


From this we find that the covariant basis vectors are given by 
O et nedee eaie 
ep = = = cos di + sin df, (19.39) 
p 


Or 


eg = a6 = —psin di + pcos ¢j, (19.40) 


622 19 Non-Cartesian Tensors 


Or 
e.= 5 =k. (19.41) 


It is a straightforward mother to show that the only derivatives of these vectors 
that are nonzero with respect to the coordinates are 


de, 1 deg 1 peg”. 


— eg; — eg, PEp- 
a6 op” dp p® ad . 
Thus, from (19.37), we immediately have 


1 
[eee oe ? and I, =—p. (19.42) 


19.3.2 Nontensor Character 


Despite their appearance, the Christoffel symbols I" K do not form the compo- 
nents of a third-order tensor. 


@ Theorem: 
Christoffel symbols Pf do not form any kind of tensor. 


Proof This is verified by considering their transformation behavior under a 
general coordinate transformation. In a transformed coordinate system, we 
have 


k k Oe; 
ers TL (19.43) 
Applying the transformation law of local basis vectors, we obtain 
k l 
ik f OU O [Ou 
ie & “au (Gre 
= dul* en Ou! ae du! de; 
~ \ du dulau't Ou!’ Ould 
= du*® aul Paeye dul® du! en Oe; 
Ou” AulAul Ou” du!" Ould 
_ Ou Pu, | Aull dul Ju" (Ger 
~ Ou" AultOult ! "Ou" Aul® Auld ou™ 
rk Zit rk l m 
Ou" Ou Ou™ Ou’ Ou™ (19.44) 


~ ul Auldud © Ou" dul aus” 


Hence, the presence of the first term in the last line in (19.44) prevents the 
rk from forming a third-order tensor. & 


19.3 Christoffel Symbols 623 
19.3.3 Properties of Christoffel Symbols 


Christoffel symbols I° K satisfy the following relations: 


1a 
Ags: 
2. Sat = 96; Vi, + gel y- 
Agi 29 : 
3. due 9 Th, 9501 eh 


a 1 dv/\gl 
4. Ti = 5,7 08 Viol = = vial, 


J\gl Ou 


Proofs of these relations are given in Exercises 2-4. 


Remark. Some textbooks refer to our three-index symbol I 5 defined by (19.37) 
as the Christoffel symbol of the second kind and use the following no- 


tation: A 
: k ej 
ko diag LEE 
0; = Vat =€r Fig (19.45) 
As a counterpart, we may define the Christoffel symbol of the first kind 
[k, ij] by 


Oe; 
Oud” 


Note that the index k on the right-hand side of (19.46) is a superscript, whereas 
that of (19.45) is a subscript. These two kinds of Christoffel symbols are related 


to each other as 
k _ ke 30 


19.3.4 Alternative Expression 


[k, ij] =e*- 


(19.46) 


In principle, we can calculate the x in a given coordinate system using the 
expression (19.37) based on e;. However, it is simple to use an alternative 
expression in terms of the metric tensor g;; and its derivatives as stated below. 


@ Theorem: 
Christoffel symbols are expressed as 


pn! qt Ogjn , 99%: — 99s; 
Wom y) Out Ow ~~ Auk J - 


624 19 Non-Cartesian Tensors 
Proof Recall the relation 


8g; 
S JD gy + gel 4, (19.47) 


given in Sect. 19.3.3. By cyclically permuting the free indices 7, 7, k, we obtain 
two further equivalent relations: 


0a; 
a = Tyger + Deigue (19.48) 
UL 
and 
O9K:i 
ai Tyej9¢ + Tis gee. (19.49) 


Then, subtracting (19.47) from the sum of (19.48) and (19.49), we find 


Ogns = 9G: jl 
Oud Ouk 


OGjk 
Ou? 


= V¥.gen + Upgie + Ph ge + Coon — Vh.9e3 — Upp gie 


= (UYigex + Vigne) + (Meigye — Minges) + (Miggee — Tyxgit) 


= 2 9ne+0+0 =20% one, (19.50) 
where we have used the symmetry properties: g,; = gj; and I; ‘ =f." "i" Con- 


tracting both sides with g™” yields 


git O9jk O9ki 09: 
Out Oud Ouk 


) a2 gw 2h oon, 


Bok mk O9;k | O9ki O95 
I =o (2 © Oud Auk ) * ee) 


This result enables us to compute the Christoffel symbol of a given coordinate 
system from information about the metric tensor. 


Examples We again evaluate the Christoffel symbols I’? for cylindrical co- 
ordinates. Using (19.51) and the fact that gi: = 1, gex = p”, g33 = 1 and 
the other components are zero, we see that the only three nonzero Christoffel 
symbols are indeed I'?, = I, and I35. Given by 
re) 1 0 1 
P= ia = Dom a = oe ape = (19.52) 
1 Og22 1 fe) 2 


r= = = 19.53 
22 2011 dul 2 ap" P; ( ) 


they agree with the expressions in (19.42). 


19.3 Christoffel Symbols 625 


Remark. The result (19.51) implies that the Christoffel symbol of the first 
kind [k, ij] mentioned in (19.46) is written as 


5 (ee O9ni ott 


a= 2\ dut Oui Ouk 


Exercises 


1. Derive equation (19.38). 
Solution: By differentiating the reciprocity relation e’ - e; = i 
with respect to the coordinates, we have 
del - be, 06 


,] ej + e’ . due = aye (19.54) 


The right-hand side of (19.54) vanishes since the element 5% con- 
sists of the contants +1 and 0, which are independent of the co- 
ordinates u*. Hence, using (19.37), we obtain 

det : 
Similar to the case of (19.36), for the moment we write the deriva- 
tive 0e*/Ou/ as a linear combination of the basis vector e’ as 


= Be’. (19.56) 


Substituting (19.56) into (19.55), we obtain BY, = —I%,,. Conse- 


7 


quently, we have Fas —Ij.,e°, or equivalently (by interchangin 
y Ouk ek y (by ging 
the subscripts), 
de’ — 
ee hn 


2. Show that Tf = rk. 

ep. -O0r. 0-08 Oey 

Oui Oud Out Ou? Oud Ou?’ 
. k [* de; k e; k 

yields Tj, =e oe  a anne = Ty; & 


Ti + gl 5p: 


Solution: It follows that which 


09%; 


3. Show that Duk — Ii 


626 19 Non-Cartesian Tensors 


Solution: Derivatives of the metric tensor gj; = e; - e; with 
respect to u* read 

Ogi; Oe; de; ’ ‘ 

Aue = Bak ep Fer aE = Tyee: e; +e, Tj,€e 


=Tige + Uipgie- & 


0 .. OF; 
4. Show that aA a ag" where g = det([g;,]. 
Solution: We know that the determinant g is given by (see 
Sect. 18.1.7) 


g= se gig" (with a fixed), 
gal 


where CO”) is the cofactor of the element g;; in g. Partially differ- 
entiating both sides with respect to gj; gives 


re (19.57) 


Since g’? = C’/g (see Sect. 19.2.4), it follows from (19.57) that 


Og _ 99 99:j _ cig 991 _ 99g 


Ouk Ogi; Our Buk IF Buk’ 
k O 
5. Show that Dy, = aul log ./g. 
Solution: According to the expression (19.51), we have 


1 ie ( O9ie ; Ogre — O9xi 
29 \Guk " Oui Bul)” 


a 
Te = 


The last two terms in the parentheses cancel out because 


ie O9ke = 04 9k: = 0 99k: 


Bui Gul 9% Bul? 
where we have interchanged the dummy indices 7 and / in the first 
equality, and have used the symmetry of the metric tensor in the 
second. Hence, we set 


ik 
i _ 9 IG:e 


This can be further simplified by using the result of Exercise 4 as 
10 1 Og O 10 0 
g g OVI _ VB = 7 108 Va. & 


Thi = 2g Ouk = 29g O/g Ouk — \/g Ouk 


19.4 Covariant Derivatives 627 
19.4 Covariant Derivatives 


19.4.1 Covariant Derivatives of Vectors 


The derivatives of a scalar in terms of Cartesian coordinates work as covariant 
components of a vector. This is also true for the case of general coordinate 
systems, as can be shown by considering the differential of a scalar 


ty 


= ato : 


do 


Since the du’ are contravariant components of a vector and d@ is a scalar, 
we see from the quotient law that the quantities 0¢/0u* must form covariant 
components of a vector. 

Except for a scalar, however, the derivatives of a general tensor do not 
necessarily form the component of another tensor. To see this, we consider 
the derivative of the covariant components v’ of a vector v with respect to a 
general coordinate uJ. In a new (primed) coordinate, it reads 


Ou? dull Ouk Ould Duk \ dul” 


dv! Auk du" auk (a ) 
du® ul" Ov! — Auk Au!" 


> ae oar uk ag aukou” (19.59) 


The presence of the second term in the last line of (19.59) prevents the deriva- 
tive Ov'/Ox) from obeying the transformation law of the components of a 
second-order tensor. The nontensor character stems from the fact that the 
second-order derivative, 
Oru" 

Dukdul? (19.60) 
involved in the last line of (19.59) does not vanish. In fact, the first-order 
derivative Ou!" /Ou! is not constant in non-Cartesian coordinates, whereas it 
is constant in Cartesian coordinates [so that the term (19.60) vanishes in the 
latter case]. 

In the context above, it is natural to introduce a new class of differentia- 
tion that turns the derivatives of components of a tensor into components of 
another tensor. This is achieved with the help of the Christoffel symbols dis- 
cussed in Sect. 19.3. Let us consider the derivative of a vector v with respect 
to the coordinates uJ. We find 


DOr Oe gees 
Oui Ow * Ous’ 


where the second term arises because, in general, the basis vectors e; are not 
constant. Using (19.36), we write 


(19.61) 


628 19 Non-Cartesian Tensors 


Ov Ov’ 


ink 
Bil Ga 


Since 7 and k are dummy indices, we may interchange them to obtain 


Ov Ov' k 
Bud ~ Dayne + UT Rye 
Ov' F 
= (= +v'Th,) ej. (19.62) 


The quantity in parentheses is referred to specificcally as the covariant 
derivative of a vector: 


@ Covariant derivative of a vector: 
The quantities defined by 


Ov' 
laser sre 


on + Tj jv". (19.63) 
are called covariant derivatives of contravariant components v* of a vec- 
tor v with respect to u’. Here, the semicolon subscript on the left-hand 


side denotes covariant differentiation. 


Using this notation, we may write the derivative of a vector in the very 
compact form 


Ov 

du 
The corresponding result for the covariant components v; can be found in a 
similar way by considering the derivative of v = v;e’ and using (19.38) to 
obtain 


= Vv" -Ej. 


= PRog. (19.64) 


19.4.2 Remarks on Covariant Derivatives 


1. The arrangement of indices i, j,k in the Christoffel symbols in (19.63) and 
(19.64) can be determined systematically in the following manner. First, the 
index to which the derivative is taken (i.e., j in this case) is the last subscript 
on the Christoffel symbol. Secondly, the other index appearing on the left- 
hand side (i.e., i in this case) also appears in the Christoffel symbol on the 
right-hand side without raising or lowering. The remaining index can then be 
arranged in only one. 


19.4 Covariant Derivatives 629 


2. Similar to vt j, 2 comparable short-hand notation for partial derivatives is 
obtained by replacing the semicolon by a comma such as 


Ov? Ov; 


- and u,;,= =. 
I Oud "I Oud 


3. In Cartesian coordinates, all the I" kj are zero, so the covariant derivative 


reduces to the simple partial derivative, say, v = =y' - 


19.4.3 Covariant Derivatives of Tensors 


Covariant derivatives of higher-order tensors can be defined by a procedure 
similar to the one for vectors. As an example, let us consider the derivative of 
the second-order tensor T’ with respect to the coordinate u®. Expressing T in 
terms of its contravariant components, we have 


OT ) 


AE ope eed) 
_ aris “Dee . 
= 5 pei ey + TY vei Ge; + Te, ® = (19.65) 
UW ak 


Using Christoffel symbols, we obtain 


ar _ ar’ 
Ouk Our 


Interchanging the dummy indices 2 and / in the second term and j and / in 
the third term on the right-hand side, we set 


e, @e; + Tie, ®e; +74 'e; ® Ty,€}. 


OT oT) 1 il 
ae & ~ +0," +IiT Jewes, 
where the expression in parentheses is the required covariant derivative defined 
by 
i OTs 
= oe jy ren By kee (19.66) 
Using the notation (19.66), we can write the derivative of the tensor T with 


respect to u* as 
OT 


duk 

Results similar to (19.66) can be obtained for the covariant derivatives of 

the mixed and covariant components of a second-order tensor. Collecting all 
of these results leads to the following: 


= 7") 1 ei @ e;. 


630 19 Non-Cartesian Tensors 


@ Covariant derivative of a tensor: 
Covariant derivatives of components of a second-order tensor T are given 
by 


ee arlene ee, 


iNee a T; aK a0 T,T; aS Eke 


£ £ 
Tig sk = Tig ek — LiTy — Dyn tit, 


where the comma notation means the taking of partial derivatives. 


The position of the indices in the expressions is very systematic. We focus on 
the index 7 or j on the left-hand side. First, the index k to which the derivative 
is taken should be the last subscript on the Christoffel symbol. Next, if the 
index (i or j) on the left-hand side is a superscript, then the corresponding 
term on the right-hand side containing a Christoffel symbol is attached to a 
plus sign. In contrast, when the index on the left-hand side is a subscript, 
the corresponding term on the right is attached to a minus sign. We can 
extend this in a straightforward manner to tensors with an arbitrary number 
of contravariant and covariant indices. 


Remark. 


1. All of the quantities fee sh .p and Tj; ,, are the components of the same 
third-order tensor VT’ with respect to different tensor bases, i.e., 


VT =T ie; @e; @e* =T} ,e,@e) @e* = 1; ne @e @ek. 


2. In general, we may call the vt ; the covariant derivative of v and denote 
it by Vv. In Cartesian coordinates, its components are just Ov'/Ox!. 

3. Given a metric tensor g, the covariant derivatives of its components, 9j;; k 
and cee are identically zero in terms of arbitrary coordinates. This is 
called Ricci’s theorem, for which we give the proof in Exercises 2 and 3). 


19.4.4 Vector Operators in Tensor Form 


This subsection is devoted to finding expressions for vector differential opera- 
tors such as grad, div, rot, and the Laplacian in tensor form that are valid in 
general coordinate systems. In principle, they are obtained in a straightfor- 
ward manner by replacing the partial derivative given in Cartesian coordinates 
with covariant derivatives. These tensor forms, however, can be simplified by 
using the metric tensor g;; as shown below. 


19.4 Covariant Derivatives 631 


1. Gradient: The gradient of a scalar ¢ in a general coordinate system is 
given by 
06 i 


Vo= one ca aul”? 


since the covariant derivative of a scalar is the same as its partial deriva- 
tive. 
2. Divergence: The tensor form of the divergence of a vector v is given by 


Ov' 
Ou 


Observe that the index 7 appears twice in the Christoffel symbol. Using 
the expression (see Sect. 19.3.3) 


(19.67) 


V-v=v',= 


(19.68) 


1 ay9 
Vg Out : 


6) 
ko a 
C= aul log /g = 
we obtain a more compact form: 


; Ov’ A Ov’ 1 OVg 
oa -+ Tv" = + (S54) 


= va (S-) ! va (32) of = a (\/gv*) (19.69) 


3. Laplacian: The tensor form of the Laplacian V7¢ is obtained by making 
use of the following relation: 


vi, = V0 =V- (V9) = V9, 


where we assume that v = V@¢. From (19.67), we have 


ab . 
ye =v=Voz= Bui 
Thus the covariant components of v are given by 
= 0? 
= Bu 


and its contravariant components v’ can be obtained by raising the index 
using the metric tensor: 


do 
jk 
tS Pte OB 


Substituting this into (19.69), we finally arrive at 


1 jk 
Vb=vi,= iow (vas i se). (19.70) 


632 19 Non-Cartesian Tensors 


4. Rotation: In general curvilinear coordinates, the operation V x v is de- 
fined by 
[V x V] i = Vi 35 — UG 345 (19.71) 


which forms covariant components of an antisymmetric tensor. The right- 
hand side of (19.71) can be simplified as 


Ov; Ov; 
Vig — UGH = aa Thue oF + Dy ive 
Ov; Ov; 
= - — 19.72 
Oui Ou?’ ( ) 


where the Christoffel symbols cancel out owing to their symmetric prop- 
erties. Therefore, components of the tensor V x v can be written in terms 
of partial derivatives as 


(19.73) 


Our results are summarized as follows: 


@ Vector operators in tensor forms: 


_ Oo; sie 4 

1. Vo Aui® Ph XY on oF (/9 ) 
eee ple) ip OP _ Ov, dv; 
BNA yee pi eae eh rail tar 


Exercises 


1. Prove that the covariant derivatives ue ; form a second-order tensor of type 
(1,1). 
Solution: Employ the transformation laws of v* and I : [see 
(19.44)] to obtain 


k 
k _ Ov kop 
V9 = Bg + Is 


Oud 
= — Ou vl?) 4 Ou® dul” ul pa Ou® Oult Ou? vt 
~ Out \ Auld Ou’? OuP Oui” * Ault BuPAud ) \ Bul 


t 
OPuk Aul® 4g Ou® Au! Ov’? 


~ Ouw*dult Out” Bul? Oud Bul® 
Ouk ( Au” Bul? 1g 07 ul? Ou? it 
+ our (3 Oud Pits iow) dwt” ae) 


19.4 Covariant Derivatives 633 


The sum of the terms involving second derivatives is zero; this 
is seen by taking a partial derivative with respect to u’? in the 


expression 
Ou® dul, 


dul Our 9 


which yields 
0 Ou® dul = Oruk Ou’ Auk APul* 
oul? & Oud ) ~ OultAu!® Bui © Aul® Au! Aw 
O2u® aul? — Ou® Au™ =G?u’® 
~ AulTau!™ Gul" Bul® Ault umdus 
=0. (19.75) 


From (19.74) and (19.75), it follows that 
pes duk Au!’ Au'? Auk, Oul® ['t yt 
I Ault Oud Ault © Aull * Oui” 7° 
Ou® Oul* ( dv’! Gch 
~ Out dui & Pisa ) 
= Ban af 
~ Bul? Oui” 


in which the last term in the last line, ee represents the covariant 
derivative of v’? with respect to the primed coordinates u’*. Hence, 
we see that the vk, form a second-order tensor of type (1,1). & 

2. Show that the metric tensor is a covariant constant, i.e., the covariant 
derivative of any component is identically zero: gxp; ; = 0. This result is 
known as Ricci’s theorem. 

Solution: It follows that 


i a Tr 
Gkp; 5 = Ikp, 7 ~— Lei Grp — Lyi Jer 


ieee ie 
= Gkp, j — 5° (93s, k + Ysk, 7 — Gi, s) 5 5k (Gis, p + Gsp, 3 — Gpj, s) 
=0. & 


3. Show that 6? and g*? are also covariant constants. 
Solution: We have 67 oe op i Tv On = Ty,o5 = 0, which com- 
pletes our first proof. Next, observe that oF = gjpg?* to find the 
identity 
k k k ke 
0= 5 = (9i9""). , = 9iv iad” + Iivgg- 

Since gj, is a covariant constant, the first term in the last expres- 
sion has the value zero. Multiplication by g/” produces the desired 
result. d& 


634 19 Non-Cartesian Tensors 


Remark. Owing to Ricci’s theorem and its two corollaries noted above, the 
components of the metric tensor can be regarded as constants under covariant 
differentiation. Thus, e.g., 


9A, k= (gic A’), p= Ais ks 
gt" .3.= (geT*™), Plt Sine 


Tik: gg” = (Tang 9°"), F = YBa Ps 


and so on. 


3. Use (19.70) to find expressions for V?¢ and V - v in an orthogonal coor- 
dinate system with scale factors h; (4 = 1,2,3). 
Solution: For an orthogonal coordinate system ./g = hih2h3; 
further, g’* = 1/h? for fixed i and g’? = 0 for i 4 j. Therefore, 
from (19.70), we set 


vip= 1 O [hyhoh3 O¢ 
hy hohg Oud he Oud . 


In a similar manner, we have 


1 ) 


Vie oa 


hyhgh3v') . & 


19.5 Applications in Physics and Engineering 


19.5.1 General Relativity Theory 


It cannot be denied that the general relativity theory is one of the most 
famous and beautiful applications of non-Cartesian tensor calculus in physics. 
This section outlines the concepts one needs in order to understand the general 
theory of relativity, which is necessary for obtaining the gravitational field 
equation and relevant tensorial quantities that are involved with the equation. 

Before proceeding to the argument, let us point out that the notion of 
geometric curvature is central to general relativity, which quantifies the 
curvature of space at any given point in the space considered. In Sect. 19.2.3, 
we learned that a space is a flat locally (or entirety), if there exist coordinates 
x’ such that the line element through a limited region (or the whole) can be 
written as 


(ds)? = e;(dx')?, 


19.5 Applications in Physics and Engineering 635 


where ¢ = +1. However, if we employ a different coordinate system a", the 
line element (ds)?, in general, is not of the above form, but reads as 


(ds)? = gi;dx'da! 


with the appropriate metric tensor g;;. Hence, we require a means of identi- 
fying a flat space directly from the metric 9;;, independent of our choice of 
coordinate system. Such a coordinate-independent way of defining the curva- 
ture of a space leads to the field equation of gravity, i.e., Einstein’s field 
equation, described in Sect. 19.5.4. 


19.5.2 Riemann Tensor 


The curvature of space can be quantified in a manner independent of the 
coordinate system by changing the order of covariant differentiation. Co- 
variant differentiation is a generalization of partial differentiation, in which 
interchanging the order of differentiation changes the result. To illustrate this, 
let us consider an arbitrary vector field with covariant components v;. The 
covariant derivative of v; is given by [see (19.64)] 


Ov; 


Pg eg sees pe 
Vi jG = aul Ty;ve. 
A second covariant differentiation then yields 
Ov; “4 
iJ m m 
(Vis) = Gye Pik Um sg — L7RVE sm 


uy; ary, re Ovg 
~ Oui duk aut | 4 \ Auk 


By interchanging the indices 7 and & to obtain the expression corresponding 
to (v; :x);; and then subtracting the expression we set from the above relation 
gives us 

(V5 55) e — (Yi sk) 5 = Rignves 
where 

ari, — OL; 


Rise = Bar ~ Gye + Tikling — OG Tint (19.76) 


The quantity Ri , shown on the left-hand side is called the Riemann tensor 
(or curvature tensor). Since Christoffel sumbols I K are functions of the 
metric tensor g;;, (19.76) indicates that the Riemann tensor is defined in terms 


of the metric tensor and its first and second derivatives. 


636 19 Non-Cartesian Tensors 


Recall that if the space being considered is flat, we may choose coordinates 
such that I, K and its derivatives vanish. Therefore, we have 


Rij, =0 (19.77) 
at every point in the flat region. In fact, it is possible to show that (19.77) 
is a necessary and sufficient condition for the region of a space to be flat. 
Consequently, we conclude the following: when the Riemann tensor satisfies 
(19.77), it indicates that the region of a space is flat and when it does not 
satisfy (19.77), the region is curved. 

Two relevant quantities are obtained by contracting the Riemann tensor. 
One is the Ricci tensor defined by 


— pk 


and the other is the scalar curvature (or Ricci scalar) given by 
R= g! Rij = Ri. 
These two quantities are important for introducing the Einstein tensor 
Gii = Ri Le R, 
2 


which describes the space-time curvature in the field equation of general rel- 
ativity. 


19.5.3 Energy-Momentum Tensor 


We now wish to determine the form of the gravitational field equation that, 
in the weak limit of a static gravitational field, reduces, to the classical 
Newtonian field of gravity described by 


V° = 4nGp. (19.78) 


Here, & is the potential field that corresponds to the space-time curvature in 
relativistic theory, G is the universal gravitational constant, and p is the 
mass-density distribution of matter. Note that (19.78) is a form of Poisson’s 
equation with 47G/p as the source term. This implies the presence of a corre- 
sponding source term associated with the space-time curvature in Einstein’s 
field equation. This source term is given by the energy-momentum tensor 
T’) defined by 
TY = pu'u’. 

Here, p is the density of matter, u’ is the four-velocity represented by 
u' = (u°, ul, u?, u3) = (ye, yv), where c is the velocity of light, v is the three- 
dimensional velocity (nonrelativistic) of a particle, and y = (1 — v?/c?)~1/?, 


19.5 Applications in Physics and Engineering 637 


The physical interpretations of the components of the energy-momentum ten- 
sor are: 


T° : the energy density of the particles. 

T° : the energy flux (the heat conduction) in the ith direction. 

T’° : the momentum density in the ith direction. 

T” : the flow of the ith-component momentum in the jth direction 


(i.e., the random thermal motions giving rise to viscous stress). 


19.5.4 Einstein Field Equation 


The parameters necessary to obtain Einsteinfs field equation, which relates 
the geometric space-time curvature to the density of mass-energy, are already 
on hand. One side of the equation should comprise the measure of the density 
of mass-energy, i.e, the stress-energy tensor T;;, and the other side should 
consist of a measure of the curvature involving the Ricci curvature R;; and 
scalar curvature R. By making this equation consistent with Newtonfs equa- 
tion of motion in the limit of a weak gravitational force as well as with several 
postulates from a physical standpoint, Einstein’s field equation is obtained in 


the following form: 


1 87rG 


Ri; as gist = a Ty. (19.79) 


Given the matter source T;;, this tensor equation is composed of ten partial 
differential equations for the metric tensor g;;(x). Apparently, the tensor equa- 
tion is analogous to the Maxwell equations that determine the electromag- 
netic field given the charge and current densities (see Sect. 18.5.3). Unlike the 
Maxwell equations, however, the differential equations of gravitational theory 
are nonlinear, which make them very difficult to solve. Surprisingly, despite 
the nonlinearity, a number of exact solutions have been obtained owing to the 
presence of symmetries in space-time, which restrict the possible forms of the 
metric. 


Remark. Einstein’s field equation in (19.79) is the most fundamental equation 
in classical physics. The explicit form of the equation can be derived from a 
few arguments. However, it cannot be derived from other physical principles 
since there is no theory that is more fundamental. 


20 


Tensor as Mapping 


Abstract In this chapter, we show that tensors can be identified with mathemat- 
ical operators that transform elements from one abstract vector space to another. 
This viewpoint on tensors is apparently different from those presented in Chaps. 18 
and 19, where tensors have been identified as sets of index quantities subject to a 
transformation law under changes of coordinate systems. However, the viewpoint 
presented here turns out to be consistent with those presented in the previous two 
chapters when we introduce the concept of inner product into the abstract vector 
space (Sect. 20.3.4). 


20.1 Vector as a Linear Function 


20.1.1 Overview 


In Chaps. 19 and 20 tensors are defined as collections of index quantities 
that obey characteristic transformation laws under a change of coordinate 
systems. In this chapter we present an alternative definition of tensors; that 
does not require specification of a coordinate system, so that it is suitable 
for more general tensor analyses describing geometric properties of abstract 
vector spaces other than our familiar three-dimensional Euclidean space. 

The crucial point is that in this alternative definition, a tensor is considered 
not as a set of index quantities but as an operator (linear function or mapping) 
acting on vector spaces. For instance, a second-order tensor T is identified 
with a linear function that associates two vectors v and w with a real number 
c € R, which is symbolized by 


T(v,w) =c. 


Emphasis should be placed on the fact that such a generalized definition 
of tensors applies to all kinds of general vector spaces (finite-dimensional), 
regardless of whether or not they possess geometric properties such as the 
distance, norm, or inner product of their elements (see Sect. 4.2.1). In 


640 20 Tensor as Mapping 


fact, the tensors we discussed earlier belong to a specific class of more general 
tensors, in the sense that they were defined solely on the threedimensional 
Euclidean space, a particular class of vector spaces endowed with the inner 
product property. However, we shall see that, the concept of tensor can be 
extended beyond inner product spaces by introducing the more general defi- 
nition referred to above. 

Throughout the following discussion, we restrict our arguments to finite- 
dimensional vector spaces over R in order to provide a minimal course for 
general tensor calculus. 


20.1.2 Vector Spaces Revisited 


To begin with, we briefly review the definition of abstract vector spaces. A 
vector space (or linear space) V over R is a set of elements called vectors 
that have two operations, addition and scalar multiplication, and a distin- 
guishing element 0 € V. Here, addition (denoted by +) assigns to each pair 
of elements v,w € V a third element v+w € V and the scalar multiplication 
assigns an element cv € V to each v € V and ce R. By definition, all of the 
elements v,w,«z € V and all a,b © R must satisfy the following axioms. 


. The commutative law for +,i.., v+w=w+uv. 

. The associative law for +, i.c., (u+w)+a=v+(wt+2). 

. Existence of identity for +,ie, v+0=v. 

. Existence of negatives, i.e., there is —v such that v + (—v) = 0. 
. a(v+w) =av+aw. 

. (a+ db)v = av + bv. 

. (ab)v = a(bv). 

_lv=v. 


ANoawawbkhwnds re 


Given two vector spaces V and W, it is possible to set a function f so that 
f: Vow. 


The function f is called a linear function (or linear mapping) of V into 
W if for all vj,v2 € V and ce R it yields 


f(v1 + v2) = f(v1) + f(v2), 
f(cv1) = cf(v1). 


20.1.3 Vector Spaces of Linear Functions 
In elementary calculus, the concepts of vectors and linear functions are dis- 


tinguished from one another: vectors are elements of a vector space and linear 
functions provide a correspondence between them. However, in view of the 


20.1 Vector as a Linear Function 641 


axioms 1 to 8 above, we observe that the set of linear functions f,g,--- of V 
into W also forms a vector space in which addition and scalar multiplication, 
respectively, are defined by 


(f+g)v = flv) + g(r), (20.1) 


and 
(cf)v = cf(v), (20.2) 


where v € V and f(v),g(v) € W. We denote by L(V,W) a vector space 
spanned by a set of linear functions f as 


f:V—-W. 
It is a trivial matter to verify that f+g and cf are also linear functions and so 
belong to the same vector space £(V,W). These arguments are summarized 
by the follwing important theorem: 


@ Vector space of linear functions: 
Let V and W be vector spaces. A set of linear functions f: V > W 
forms a vector space denoted by L(V, W). 


This theorem states that the linear functions fj, fo,--- of V into W are 
elements of a vector space £L(V,W), analogous to vectors v1,v2,--- being 
elements of a vector space V. This analogy implies that a linear function 
f € L(V, W) can be regarded as a vector and, conversely, that a vector v € V 
can be regarded as a linear function. Such identifying vectors and linear func- 
tions is crucially important for obtaining a generalized definition of tensors 
that is free of the concept of inner product and the specification of a coordinate 
system. 


20.1.4 Dual Spaces 
Let V* denote a set of all linear functions such as 
f[: VR. 


(Note that the asterisk (*) in V* does not mean complex conjugate.) Then, 
since 
V*=L(V,R), 


it follows that V* is a vector space. The vector space V* is called the dual 
space (or conjugate space) of V, whose elements f € V* associate a vector 
v € V with a real number c € R, symbolized as 


flv) =c. 


642 20 Tensor as Mapping 
Particularly important elements of V* are linear functions 
ge: V—oR (@=1,2;:+* ,n) 


that associate a basis vector e; € V with the unit number 1. In fact, a set 
of such linear functions {e/} serves as a basis of the dual space V* as stated 
below. 


@ Dual basis: 
For each basis {e;} for V, there is a unique basis {<7} for V* such that 


el(e;) = 6. 


7 


(20.3) 


The linear functions <’ : V > R defined by (20.3) make up the dual basis 
to the basis {e;} of V. 


Proof Let us verify that the set of {7} defined by (20.3) serves as a basis of 
V*. Recall that in finite dimensions, a basis of a vector space V is defined 
as a set of linearly independent vectors that spans all of V. To show linear 
independence, we assume that aye? = 0. Then we have 


a,e! (e;) = a6) = a, = O for all 1, 


which implies that {e7} is linearly independent. & 


Remark. Raising of the index j attached to é/ is intentional, as this convention 
is necessary to provide a consistent notation of components of generalized 
tensors, demonstrated in Sect. 20.3. 


Examples Expand a vector v € V as 
v= v'e;, 
to find that a a a 
el(v) =e! (v'e;) = v'e (e;) = v5] = 0!. 


This indicates that ¢/ is the linear function that scans the jth component of 
v with respect to the basis {e;}. 


20.1.5 Equivalence Between Vectors and Linear Functions 


If V is a vector space and 7 € V*, then 7 is a function of the variable v € V 
that generates a real number denoted by 7(v). Owing to the identification of 
vectors and linear functions, however, it is possible to reverse our reasoning 
and consider v as a function of the variable 7, again with the real value 
v(T) = T(v). When we take this approach, v is a linear function on V*. 


20.2 Tensor as Multilinear Function 643 


Remark. The two views contrasted above are both asymmetric, but this asym- 
metry can be eliminated by introducing the notation 


be Vie 


which gives 
(v,T) =T(v) =v(7T) ER. 


Here ( , ) is a function of two variables v and 7, called the natural pairing 
of V and V* into R. It is easy to verify that ( , ) is bilinear. 


The concepts and notation introduced in Sect. 20.1.3 and 20.1.4 serve as 
preliminaries for the discussions in the following sections. 


20.2 Tensor as Multilinear Function 


20.2.1 Direct Product of Vector Spaces 


To arrive at the new definition of tensors we are seeking requires three more 
concepts, demonstrated in Sect. 20.2.1—20.2.2. 

The first is the direct product of vector spaces; if V and W are vector 
spaces, then we can establish a new vector space by forming the direct prod- 
uct (or Cartesian product) V x W of the two spaces. The direct product 
V x W consists of ordered pairs (v,w) with v € V and w € W, as symbolized 
by 

VxW={(v,w)|veEeV,we WwW}. 
The addition and scalar multiplication of the elements are defined by 


(v,w1) + (v, w2) = (v, wi + We), 


(v1, Ww) + (v2, w) = (vi + v2, Ww), 
C(v, w) = (cv, w) = (v, cw). 


The linear dimension of the resulting vector spaces V x W equals the product 
of the linear dimensions of V and W. The elements (v, w) of the direct product 
V x W is sometimes noted by vw. 


Remark. The reader should note a distinction between the direct product 
V x W and the direct sum V + W of the two vector spaces. A direct sum 
V +W consists of all pairs (v, w) = (w,v) with v € V and w € W for which 
addition and scalar multiplication are defined by 


(v1, w1) + (v2, W2) = (v1 + U2, Wy, + Wo), Cv, w) => (cv, cw). 


The linear dimension is thus equal to the sum of the dimensions of V and W. 
Every linear vector space of dimension greater than one can be represented 
by a direct sum of nonintersecting subspaces. 


644 20 Tensor as Mapping 

20.2.2 Multilinear Functions 

Let V;, V2 and W be vector spaces. A function 
f:VWxVe-W 

is called bilinear if it is linear in each variable, i.e., if, 


flav, + bv’, V2) => af (v1, V2) ae bf (v's, V2), 
f(v, avg + bv’s) = af (v1, v2) a bf (v's, V2). 


The extension of this definition to functions of more than two variables is 
simple. Indeed, functions such as 


f: Vix V2 x-++x Vn a W (20.4) 


are called multilinear functions, more specifically n-linear functions, for 
which the defining relation is 


f(vi,-+: av; + bv';, ++: ,Un) =af(vi,-°: »Vis- Un) 
+ bf(vi,--: fl gees Un): 


An n-linear function can be multiplied by a scalar and two n-linear func- 
tions can be added; in each case the result is an n-linear function. Thus, 
the set of n-linear functions given in (20.4) forms a vector space denoted by 
L(V, x +++ x Vn, W). 


20.2.3 Tensor Product 


Suppose that +! € V;* and 7? € V;, ie., 7! and 7? are linear real-valued 


functions on V; and V3, respectively. We can then form a bilinear real-valued 


function such as 
Ter: WUxh—R, 


which is represented by 
T* @7*(v1, 02) = 77 (v1)77(v2). (20.5) 


Note that the right-hand side of (20.5) is just the product of two real numbers: 
7+(v1) and r?(v2). The bilinear function 7! @7? is called the tensor product 
of rT! and r?. Clearly, since rT! and 7? are separately linear, so is T!'@7?. Hence, 
the set of the tensor product 7! @ 7? forms a vector space L(V; x V2, R). 

Recall that the vectors v € V can be regarded as linear functions acting 
on V*. In this context, we can also construct tensor products of two vectors. 
For example, let v; € V; and v2 € V2 and define the tensor product 


V1 @v2: Vi xVS -R 


20.2 Tensor as Multilinear Function 645 
by 
V1 &® vo(T1, T2) = v1(T1)¥2(T2) = 71(U1)T2(v2). (20.6) 


This shows that the tensor product v; ® v2 can be considered a bilinear 
function acting on V;* x V,", similar to 7; ® 72 being a bilinear function on 
V, x Vo, which indicates that the set of vi ® v2 form a vector space L(V;* x 
V;, R). 

Furthermore, given a vector space V, we can construct mixed types of 
tensor products such as 


veT: VixVoR 


given by 
v@T(d,u) = v(d)r(u) = o(v)u(r), (20.7) 


where u,v € V and ¢,7 € V*. Inastraightforward extrapolution, it is possible 
to develop tensor products of more than two linear functions or vectors such 
as 

Vi BVUWs- Vp WTOP B 7%, (20.8) 


which act on the vector space 
Ve xVe xe KVE XV KV xe x V, 


where V* appears r times and V s times. Similar to the previous cases, the 
set of tensor products (20.8) forms a vector space denoted by 


LVN rs Ee, 


where (V*)" and V* are direct products of V* with r factors and those of V 
with s factors, respectively. 


20.2.4 General Definition of Tensors 


We finally arrive at the following generalized definition of a tensor. 


@ Tensor: 
Let V be a vector space with a dual space V*. Then a tensor of type 
(r,s), denoted by T’,’, is a multilinear function 


BAY x Vy se 


The number r is called the contravariant degree of the tensor, and s is 
called the covariant degree of the tensor. 


646 20 Tensor as Mapping 


@ Tensor space: 
The set of all tensors T” for fixed r and s forms a vector space, called 
a tensor space, denoted by 


Ti (V) =L£[(V*) x V°, Bl. 


As an example, let v1,--- ,v, € V and r!,--- ,7° € V* and define the tensor 
product (i.e., multilinear function) 


Tl = @-:- Or, OT @+- Or, (20.9) 
which yields for 6',--- ,6" € V* and wy,:-: ,us EV, 


V1 @---@v,@r@ - @r(O,--: ,0" Ui, Us) (20.10) 
= 01 (0') +++ v,(0")r! (a1) «++ T9(uts) 


Tr Ss 


Observe that each v in the tensor product (20.10) requires an element 0 € V* 
to produce a real number @, which is why the number of factors of V* in the 
direct product (20.9) equals the number of v’s in the tensor product (20.10). 

In particular, a tensor of type (0,0) is defined as a scalar, so 7)(V) = R; 
a tensor of type (1,0), an ordinary vector, is called a contravariant vector; 
and one of type (0,1), a linear function, is called a covariant vector. More 
generally, a tensor of type (r,0) is called a contravariant tensor of rank 
(or degree) r and one of type (0,8) is called a covariant tensor of rank (or 
degree) s. 


Remark. We can form a tensor product of two tensors T,.” and Be such as 
RPO AVA OVO, 


which is a natural generalization of tensor products given in (20.5), (20.6), 
and (20.7). It is easy to prove that the tensor product is associative and 
distributive over tensor addition, but not commutative. 


20.3 Components of Tensors 


20.3.1 Basis of a Tensor Space 


In physical applications of tensor calculus, it is necessary to choose a basis for 
the vector space V and one for its dual space V* to represent the tensors by a 


20.3 Components of Tensors 647 


set of real numbers (i.e., components). The need for this process is analogous 
to the cases of elementary vector calculus, in which linear operators are often 
represented by arrays of numbers, i.e., by matrices referring to a chosen basis 
of the space. A basis of our tensor space T,"(V) = L[(V*)" x V°, R] is defined 
as follows. 


@ Basis for the tensor space: 
Let {e;} and {e7} be a basis in V and V*, respectively. Then, a basis 
for the tensor space 7,’(V) is a set of all tensor products: 


€j, B+: @Ej;, El @-+- Ber. (20.11) 


@ Components of a tensor: 
The components of any tensor A € 7,(V) are the real numbers given 
by 


ae SAN etn. Roe eae ane) . 


Remark. 1. A useful result of the theorem is the relation 


A= ABT ei, 8+ Bei, @EX B+ @ek. 

2. Note that for every factor in the basis of T,°(V), there are N possibilities. 
(For instance, we have N choices for e;, in which i; = 1,2--- ,N.) Thus, 
the number of possible tensor products represented by (20.11) is N’*S. 


Examples 1. A tensor space J,|(V) has a basis {e;} so that an element (i.e., 
a contravariant vector) v € 7)'(V) can be expanded by 


v= v'e;, 
where the real numbers v’ = v(e*) are called the components of v: V > R. 


2. A tensor space 7,°(V) has a basis {e/} so that an element (i-e., a covariant 
vector) T € 7,°(V) can be expanded by 


F=Te ; 
where the real numbers 7; = T(e;) are called the components ofr: V*— R. 


3. A tensor space 7,?(V) has a basis {e;®e; ®e*} so that for any A € T?(V) 
we have 7 
A= Aye; ® ej ® eas 


where the real numbers 


648 20 Tensor as Mapping 
A? = A (e’,€/,ex) 


are the components of A: V x Vx V* > R. 


20.3.2 Transformation Laws of Tensors 


The components of a tensor depend on the basis in which they are de- 
scribed. If the basis is changed, the components change. The relation between 
components of a tensor in different bases is called the transformation law 
for that particular tensor. Let us investigate this concept. 

Assume two different bases of V, denoted by {e;} and {e’;}. Similarly, we 
denote by {e/} and {e"} two different bases of V*. We can find appropriate 
transformation matrices [R?] and [Sf] that satisfy 


e'; = Rie; and ec! = She! 
Then, for a tensor T of type (1,2), we have 
De =T Ce é' 5, ex) =F ( Sie", Rp em, Rien) 
= SERT RET (e*, Em, en) 
= SRO nee 


mn? 


(20.12) 


which is the transformation law of the components of the tensor T' of type 
(1,2). 

Remember that in the coordinate-dependent treatment of tensors (as 
shown in Chaps. 18 and 19), the result (20.12) was considered to be the 
defining relation for a tensor of type (1,2). In other words, a tensor of type 
(1,2) was defined as a collection of numbers 77", that transform to another 


collection of numbers T’;,, according to the rule in (20.12) when the basis is 
changed. In our current (i.e., coordinate-free) treatment of tensors, it is not 
necessary to introduce a basis to define a tensor; a basis must be introduced 
only when the components of a tensor are needed. The advantage of this 
approach is obvious, since a (1,2)—type tensor has 27 components in three 
dimensions and 64 components in four dimensions, and all of these can be 
represented by the single symbol T. 


Remark. Note that the above arguments do not downplay the role of compo- 
nents. In fact, when it comes to actual calculations, we are forced to choose a 
basis and manipulate the components. 


20.3.3 Natural Isomorphism 


We comment below on an important property that is specific to components of 
tensors A € J,'(V). We know that tensors A € 7;/(V) are bilinear functions 
such as 


20.3 Components of Tensors 649 
A:V*xV-R 
and that their components A‘ are defined by 
Ai = A(e’,e;), (20.13) 


where each ¢’ and e; is a basis of V* and V, respectively. Now we consider 
the matrix 


Al Al... Al 

Ae tiiag 
ies stalls (20.14) 

oa AD 


whose elements Aj are the same as those given in (20.13). We shall see that 
(20.14) is the matrix representation of a linear operator A in terms of the 
basis {e;} of V, which associates a vector v € V with another u € V, i.e., 


A:V—-V. 
A formal statement on concerning this point is given below. 
@ Natural isomorphism: 
For any vector space V, there is a one-to-one correspondence (called a 


natural isomorphism) between a tensor A € 7!(V) and a linear operator 
AEL(V,V). 


Proof We write the tensor A € T;!(V) as 
A= Ave; @ei, 
Given any v € V, we obtain 
A(v) = (Aje; @e’) (v) = Ave; [e7(v)| 
= Ave; [e?(v* ex) | = Ave; (v*5i) 
= Aivie;. (20.15) 


Observe that the Aji in the last term are real numbers and that e; € V. 
Hence, the object A(v) is a linear combination of bases e; for V, ie., 


A(v) € V. 
Denoting A(v) in (20.15) by u = u'e;, we find that 


ub = Abd), (20.16) 


650 20 Tensor as Mapping 


in which wu’, v! are contravariant components of the vectors u,v € V, respec- 
tively, in terms of the basis {e;}. The result (20.16) is identified with a matrix 


equation: 
es) eemee eel | 
ur A? vee v? 
= eee eee : (20.17) 
u” Ay Al .-- A? yu” 


Thus we can see that given A € 7,'(V), its components form the matrix 
representation [A%] of a linear operator A that transforms a vector v € V into 
another vector u € V. 

Conversely, for any given linear operator on V with a matrix representation 
[A] in terms of a basis of V, there exists a tensor A € T;'(V). This suggests 
a one-to-ome correspondence between the tensor space J,'(V) and the vector 
space £(V,V) comprising the linear mapping f: V>V. & 


A parallel discussion serves for a linear operator on V*. In fact, for any 
T € V*, we have 


A(r) = (Ave; @el) (7 = Aj [e;(7)] e? 
mAs fe ne] anit)? 
= Aire?! 5 


which means that A(r7) is a linear combination of bases €/ for V*, ie., 
A(r) €V*. 
Denoting A(r) by 6 = 6;e7, we obtain 
0; = Ajti, (20.18) 


where 0; and 7; are (covariant) components of the vectors 6,7 € V* in terms 
of the basis {e"}. Using the same matrix representation of [Aj] as in (20.17), 
we can rewrite (20.18) as 


A} A} +++ Al 
Aj 
(1, Gn oo [m1 Tn] , 


which describes a linear mapping from a vector 7 € V* to another vector 
@ € V* through the linear operator with the matrix representation [A‘}. 

We thus conclude that there is a natural isomorphism among the three 
vecror spaces: 


20.3 Components of Tensors 651 
GVy= LV kV Rs LO4V), sand 2"). 


Owing to this isomorphism, these three vector spaces can be treated as though 
they are the same. 


20.3.4 Inner Product in Tensor Language 


As noted at the beginning of this chapter, our current discussion is applicable 
to any kind of vector space regardless of whether or not it is endowed with 
inner product properties. If the spaces we are dealing are inner product spaces, 
then all the results of Chaps. 18 and 19 are reproduced, owing the assumption 
that only a Euclidean space (i.e., a real inner product space; see Sect. 19.2.3) 
is considered there. In this subsection, we shall see that this is true, but we 
first have to introduce the concept of inner product in connection with our 
current vector spaces. Such a discussion enables us to make the correspondence 
between the two views of tensors clear: 


tensors are sets of index-quantities (in Chaps. 18 and 19), 
and 


tensors are linear mappings (in Chap. 20). 


Below is the definition of the inner product in the language of tensor 
calculus. 


@ Inner product: 
An inner product, denoted by (, ), is a real-valued function such as 


(,):VxV—-R, 
which has the following properties: 
(i) it is nondegenerate, i-e., 
(u,v) =0 forally <= wu=0, 
(ii) it is symmetric, ie., (u,v) = (v,u), 


(iii) it is positive definite, i.e., (u,v) > 0 whenever u 4 0, and 
(iv) it is bilinear, ie., (aw+ bu, w) = a(u,v) + b(v, w). 


Remark. The set of four axioms above is a restatement of those presented in 
Sect. 4.1.3 for real vector spaces. 


652 20 Tensor as Mapping 
By definition, the inner product of v and w reads 
(v, w) = (v'e;, w/e;) = v'w" (e;, e;), 


where (e;,e;) as well as (v, w) are certain real numbers. Then, if we establish 
a matrix [g;;] with the entities 


Gig = (€:, 5), (20.19) 


we have 

(v,w) = gijv'w’, 
which reproduces the previous notation (18.27) obtained via the coordinate- 
dependent treatment of tensors. 


Remark. The notation in (20.19) seems to imply that the function ( , ) is 
written in terms of the dual basis e* € V* by 


(, ) =gijei @e?. (20.20) 


However, this is not the case because (20.20) does not satisfy the symmetric 
property required by (ii) in the above definition of an inner product. 


20.3.5 Index Lowering and Raising in Tensor Language 


We demonstrate below another important consequence of the notation in 
(20.19). Since the inner product (v, w) is a bilinear function with variables v 
and w, it is a linear function of w if we fix v. Assume a function: V > R 
defined by 

v(w) = (v, w). 


Clearly, v is a linear function of w and v € V*. Hence, v can be expanded by 
the dual basis <7 € V*, which results in 


y=vje) =v(e;)e4 
= (v, e;)e! a (v'e;,e;)e? 


= v'gije!. 
This indicates that the components v; of the linear function v are given by 
Vj = giv". 
Denoting v; by vj, we obtain 


Vj = Gis"; (20.21) 


20.3 Components of Tensors 653 


which is identified with the index lowering of v’ by the use of g;;. Emphasis is 
placed on the fact that the result (20.21) gives a one-to-one relation between 
v €V andy € V* via the entities g;; = (e;,e,;). That is, going from a vector 
v € V to its unique image v € V* is achieved by simply lowering the index of 
the contravariant component of v through relation (20.21). 

The counterpart of (20.21), index raising, is obtained by noting the fact 
that by hypothesis the matrix [g;;] is nondegenerate. This implies the existence 
of the inverse matrix denoted by [g’’]. Multiplying the elements g*! by both 
sides of (20.21) yields 


gv; a g*! gizv" = go) giv" =a dfut a vo, 
i.e. - 
v' = guj. 
We have thus shown that the introduction of the matrix [g,;] composed of 
the real numbers g;; defined by (20.19) provides a bridge between the two 
apparently different viewpoints (those in Chaps. 18, 19 and in Chap. 20) 
regarding tensors. 


Part VII 


Appendixes 


A 


Proof of the Bolzano—Weierstrass Theorem 


A.1 Limit Points 


In this appendix we prove the Bolzano—Weierstrass theorem, first intro- 
duced in Sect. 2.2.2, which guarantees the existence of a limit point in some 
sets of real numbers. For a better understanding, we begin with a brief review 
of the basic properties of limit points. 

Below is we repeat the definition of a limit point from Sect. 1.1.5. 


@ Limit point: 
A point xz € R is called a limit point of a set S C R if every neighbor- 
hood V of x contains an element different from x. 


We denote by S the set of limit points of S. A point in S that is not a limit 
point of S is called an isolated point of S. A limit point is often referred to 
as a Cluster point or accumulation point. 

Observe that « € S if and only if every neighborhood of x contains an 
infinite number of points of S. This is so because if a neighborhood 


V =(a-—0,4+0) 


of a limit point x contains only a finite number of points, say a1, 4@2,--- ,Qn, 
where a; 4 x, then there is a positive number e€ such that 
€= min |a;,—2|. 
l<i<n 

Since x is a limit point of S, there is a point a € S such that a # x and 
|z —a| < e. This means that a € V but a ¥ a; for any i, which contradicts the 
assumption that V contains only n points of S. The implication in the other 
direction is obvious. 

A finite set cannot have a limit point, since any neighborhood of a limit 
point must contain an infinite number of elements of the set. On the other 
hand, an infinite set may or may not have a limit point. 


658 A Proof of the Bolzano—Weierstrass Theorem 


A.2 Cantor Theorem 


The previous discussion raises the question: When does a set possess a limit 
point? The following theorem serves as a lemma to answer this question. 
Meanwhile we denote by (J) = b—a the length of any closed interval I = [a,b] 
with a < b. 


@ Cantor theorem: 

Let (I,) be a sequence of nonempty, closed, and bounded intervals. If 
In41 © In for every n € N, then the intersection ()7~_, In is not empty. 
Furthermore, if 

inf{@I,) : ne N}=0, 


then (\7__, I, is a single point. 


Proof Suppose Ip, = [an, bn] and I = (\7—_, In. Using the nested property of 
the intervals I,, we have 


m>p => In Clp > Gp < Gm < bm < by. (A.1) 


Clearly, the set S = {a, : n € N} C Ris not empty and is bounded above 
by b;. Hence, the set S has a least upper bound, which we call x. If we can 
prove that « € J, we can conclude that J is not empty. In fact, this can be 
proven by observing that 


cel, foralkeN, 


Qr<u<bp forallke N. (A.2) 


First, it is obvious from the definition of x that a, < x for all k. Second, 
b; for arbitrary k satisfies a, < by for all n € N, i-e., by is an upper bound of 
S. In fact, ifn < k then, by (A.1), an < ag < bg; and if n > k then, again by 
(A.1), an < bn < by. Finally, it follows that 2 < by for all k € N, since x isa 
least upper bound of S, whereas bz is just an upper bound of S$. Thus we can 
conclude that (A.2) is true. 

Now we consider the second statement in the above theorem. Suppose that 


inf{@(,) : ne N}=0, 
and let x,y € I. It then follows that x,y € I, for every n, which implies that 
jc—y| <¢U,) foralne N, 


so 
jc —y| <inf{@(,) : ne N}=0. 


A.3 Bolzano—Weierstrass Theorem 659 


This means that x = y, i.e., the interval 


I= q Le 
n=1 


is a single point. & 


A.3 Bolzano—Weierstrass Theorem 


We are now ready to prove the Bolzano—Weierstrass theorem, which gives us 
sufficient conditions for the existence of a limit point in a set. 


@ Bolzano—Weierstrass theorem: 
Every infinite and bounded subset of R has at least one limit point in R. 


Proof Let S be an infinite and bounded set of real numbers. Being bounded, 
S' is contained in some bounded closed interval Ip = [ao, bo]. First we bisect 
Ip into the two subintervals 


b 
I= Jeo, OE "| i= ae 


Since S C (16 UJ) is infinite, at least one of the two sets SN Jj and SN Ij 
is infinite. Let I) = [a1, bi] = 19 if SO J is infinite; otherwise let [; = If’. We 


then have 


ba — 
Geet: eh) =>. 


Now we bisect J; into the subintervals 


ay +b, a, + by 
i= |a, 5 j.w=| 5 ae 


one of which necessarily intersects S in an infinite set. Let Iz = [ag, bg] = I} if 
SMI; is infinite; otherwise let Iz = Ij'. Continuing in this fashion, we obtain 
the intervals I; = [a;,b;], 0 <i <n, which satisfy 
bo — a 
Cha, h)=— 5 , 


and the fact that SM J; is infinite. We bisect [,, again to obtain 


jie on, ee ies aa 


2 2 


Since I, = I}, UI!’ and SI, is infinite, either $1 I’, is infinite wherein we 


660 A Proof of the Bolzano—Weierstrass Theorem 


set Ingi = [Gn41,bn41] = Ij, or SOI) is infinite where [,,41 = I!’ is chosen. 
Now we see that 
bo ~ a0 — €Un) 


In+1 ce ‘ES l(In41) = “9nt1_ = 2 


and that SM I,41 is infinite. By induction, therefore, it is show that there 
exists a sequence (J,,) of nonempty, closed, and bounded intervals. In view of 
Cantor’s theorem, we see that the intersection apes I, contains at least one 
single point x. We now complete our proof by showing that x € S. 

Suppose that x € (\?__, In. Given any e > 0, we can choose n € N so that 


bo — ag 
Qn 


<6, 


or equivalently, 
[In| <é. 


This, together with the fact that x € [,, for all n, implies that 
I, C (w@—e€,x +6). 


Since J,, contains an infinite number of elements of S', so does the neighbor- 
hood (a —¢,x+¢) of a;hencexe S. & 


B 


Dirac 6 Function 


B.1 Basic Properties 


In this appendix, we review the properties and various expressions of Dirac’s 
6 function. The first thing to be noted is that the 6 function is not a function 
at all. A function is a rule that assigns another number to each number in a 
set of numbers. However, the 6 function as used in physics is rather a short- 
hand notation for a complicated limiting process whose use greatly simplifies 
calculations. It takes on a meaning only when it appears under an integral 
sign, in which case it behaves as 


J fos(oaz = F00, (B.1) 
For the special case of f(z) = 1. We have 


/ d(a)dx = 1. (B.2) 
If the singular point is located at an arbitrary point xo, then 
if f(a)d(a — xo)dx = f (x0). (B.3) 
Except at the singular point xp = 0, 
d(x) = 0. (B.4) 
Thus 6(x) vanishes at all points where its argument is not zero, but at that 


one point it is undefined. Nevertheless its behavior near this point is all that 
matters. 


662 B Dirac 6 Function 


Let 6a(x) be a set of functions parametrized by the index a that has the 
properties 


lim da(x) =0 for all « £0, 
+00 
lim f(x)oq(x)da = f(0). (B.5) 


a0 Jo 


In precise terms the original equations defining the 6 function must be inter- 
preted as standing for the limiting processes of (B.5). 


B.2 Representation as a Limit of Function 


In what follows, we look at several sets of functions that are endowed the 
properties described in (B.5). 

1. The limit of a box function 

The simplest example is the function 6,(x) defined (for c > 0) by 


1/c for |x| <c/2, 
soe { Iz] < ¢/ 


(B.6) 
0 for |a| > ¢/2. 


Clearly, lim,_.o 6.(z) = 0 at all z 4 0 and abe 5-(x)dx = 1 independent of c. 
In addition, we have 


lim is f(x)o-(x)da = f(0), (B.7) 


c0 J 


which can be shown formally for continuous functions f(x) as 


; ual c/2 
tin fs f(a dx = im, f(x dx = lim . as f(x)dax 


a lim £9) 


c—0 Cc 


rg dx = lim f(€&c). 
—c/2 c0 


In the last line, the mean value theorem of integral calculus was employed 
with the definition —1/2 < € < 1/2. Letting c — 0, we obtain (B.7). 


2. The limit of a Gaussian function 


The sequence of the Gaussian distribution function 


1 
at Sa 


da(x) 


B.3 Remarks on Representation 4 663 


provides another neal of the 6 function. Note that lima da(x) = 0 
at all x # 0, dasa 8 (x)dx = 1 independent of a, and lima f~. f(x)da(z) 
dx = f(0). The entire contribution to the integral, as a — 0, comes from the 
neighborhood of « = 0. Therefore, we may write eymibolieally: 


ent /a? 


ie) = ln Ble) = lng rae 


3. The limit of a Lorentzian function 


Another useful representation for the 6 function is 


: 1 € 
OE eT Oe) on NO ce 


which the reader can work out as in the above example. 


4. The n — oo limit of a sequence of functions 


The final representation of the 6 function is slightly different from the preced- 
ing three and plays a central role in the proof of the Weierstrass theorem, as 
demonstrated in Appendix C. It is defined as 


€,(1—27)" for O<|2|<1 (n=1,2,3---), 
bn (x) = (B.8) 
0 for |a| > 1, 


where the constant c, must be determined so that 


ik bn(a)da = 1. (B.9) 


The functions 4,,(a) form a sequence whose limit is a 6 function. This represen- 
tation of the 6 function differs from the others in that the defining parameter 
nm increases to infinity, rather than decreasing continuously to zero. 


B.3 Remarks on Representation 4 
We show below that representation 4 above has the conditions (B.5) required 


for identification with Dirac’s 6 function. At first, we determine the normal- 
ization constant c,. From the hypothesis (B.9), we have 


a -[ (sytem? fa — ayaa, (B.10) 


Cn 1 


664 B Dirac 6 Function 


Making the change of variable x = sin 8, we obtain 


1 22th 
aw an+1 9d = B.11 
rs [ £35) (On)? ay) 
which becomes e WI 
n+) 


Next we consider the asymptotic behavior of c, as n — oo. It follows from 
(B.10) that 


1 1 1//n 
—= a (1 —2?)"dz > a (1 — a”)"dz, (B.13) 
0) 0 


Cn 


since 1/,/n for all n = 1,2,--- and the integrand is positive throughout [0, 1]. 
Now we consider the function 


Since g(0) = 0 and 
g(x) = 2nx [1-(1—27)"""] >0 forall 0<2<1, 


g(x) must be monotonically increasing in the interval [0,1]. Therefore, g(a) > 
0, or equivalently, 
(l—2?)">1-nz 


for all x in [0,1]. Using this inequality in (B.13), we have 


Le., 
Cal t. (B.14) 


This result implies that the limit n — oo of the function 6,,(a) given in (B.8) 
equals zero for all « 4 0. 
Finally, we examine the validity of the relation: 
lim f(x) on (x)dx = f (0). (B.15) 
To prove this, it suffices to verify that the contribution to the integral 
f i dn(x)dx comes increasingly from the neighborhood surrounding the origin 
as n — oo. Note that for 0 <¢ <1, 


is Sipde= / tes (B.16) 


B.3 Remarks on Representation 4 665 


since 6,(x) is an even function of «. Now 


/ One dae x va | (l—2?)"de < /n(1—e7)"(1—e) < /n(1—e7)", (B.17) 


where we employ the fact that (1—«?)” in the interval [e, 1] takes its maximum 
at x =. It is obvious that the decreasing behavior of the term (1—e?)" with 
n dominates increasing behavior of the term \/n, so that 


1 
lim bn(x)dax = 0. (B.18) 


The results (B.16) and (B.18) justify the desired relation (B.15). 


C 


Proof of Weierstrass Approximation Theorem 


@ Weierstrass approximation theorem: 
If a function f(x) is continuous on the closed interval [a,b], there exists 


a polynomial 
= ye cpa® (C.1) 
k=0 
that converges uniformly to f(x) on [a, }]. 


To prove this, we may assume without loss of generality that f(a) is defined 
on [0,1] and that f(0) = f(1) = 0. Outside the interval [0,1], we may define 
f(x) to be identically zero. Then, the relevant polynomial (C.1) is given by 
the integral form as 


1 
= ‘ f(atto,(dt, O<a<1. (C.2) 
-1 
Here 6,,(t) is the sequence of the functions represented by 
Cn(1—#?)” for —1<t<1, 
bn(t) = 
0 for |t| > 1, 
where Cy, is 
(2n + 1)! 
Cn = Donel (pl) 12" so that ond 


(In fact, the sequence 6,,(t) as m — oo does have the properties characterizing 
a 0 function; see Appendix B). Since f(a) is assumed to vanish outside the 
interval [0, 1], (C.2) can be rewritten as 


Gils = /  f(a-+ Bbq (tat. 


668 C Proof of Weierstrass Approximation Theorem 


By a change of variable t — t — x, we obtain 


=f ros n(t — x)dt = [ 4 Jen [1 — (t— 2)?]” dt. 


This last integral shows that G,,(x) is a polynomial of degree 2n in x. In what 
follows, we prove that the sequence of polynomials given by {G1 (a), Go(x),---} 
converges uniformly to f(x). 

Since f(a) is continuous on [0,1], there exists an appropriate infinitesimal 
6 such that for a given ¢ > 0, 


|f(a@+ 6) — flx)| <e 


for all x in [0,1]. Now, we use (C.2) for G,,(a) to obtain the quantity 


IG — cay} =| fi tr f(a +t) — f(x)] by (t)dt 


< i: Lie +8) ~ F(2)] bu(tat (C.3) 


where 6,(¢) > 0 on t € [0,1]. If the last term in (C.3) vanishes as n — oo, the 
proof of the theorem is complete. 
To show this, we break up the range of integration into three parts, 


1 -7 y 1 
fe Me eal 
= an ay y 
where ¥ is a certain infinitesimal number. Since f(x) is continuous on a closed 


interval, it is bounded there. Let the maximum value of |f(«)| = M. Then 
the last term of integrals becomes 


/ fle +t) — f(a)| dn (Bat < i If(e+£)| bn(t)dt + i Lf (2) | bn (t)at 


< am [ bu(dat <2Mi/n(1 — 77)", (C.4) 
7 
where we have used the inequality (B.17). Similar arguments yield 
[e+ = selena < amy — 7)" (C5) 
Finally, the remaining integral, [ees is estimated by using the continuity of 
f(x), which guarantees that for any ¢ we can find an infinitesimal y that 


satisfies 
tl<y > |f+t)—fla)|<e. 


C Proof of Weierstrass Approximation Theorem 669 


This yields 


y 
/ If(@ +t) — f(2)| da(t)dt < ef GaSe (C6) 
=4 =H 
since je, bn (t)dt < 1. 

Collecting the results of (C.3)—(C.6), we have 


IGn(x) — f(2)| <4Myn(1— 7°)" +e. 


The value of \/n(1 — y7)" for 0 < y < 1 can be set arbitrarily small for large 
enough n and, in particular, smaller than ¢. Therefore, there exists an N such 
that 

n>N => |G,(a)—f(x)|<e 


for any arbitrarily small preassigned ¢ where N does not depend on x. This 
means that the sequence of polynomials G,,(a) converges uniformly to the 
continuous function f(x) on [0,1], which completes the proof. We emphasize 
that the above discussion holds for an arbitrary continuous function on an 
arbitrary finite closed interval [a,b], as was indicated at the outset. 


Remark. It should be emphasized that our initial hypothesis that f(0) = 
f(1) = 0 imposes no limitation on the validity of the proof. To see this, we 
now suppose that f(x) is defined on [a,b]. Then, the function g(x) defined by 


a (F=*) = F@) 


yields f(a) = g(0) and f(b) = g(1), where any =z in the interval [a,b] corre- 
sponds to z in [0,1]. Furthermore, by introducing the function 
h(z) = g(2) — g(0) — z[g(1) — 9(0)] 


for z in [0, 1], we have h(0) = 0 and h(1) = 0. We can show that the polynomial 
G,,(a) that approximates the original function f(x) also approximates the 
modified function h(z) by replacing the variable x in G,() by z. 


D 


Tabulated List of Orthonormal Polynomial 
Functions 


Hermite Polynomials H,,(x) 


Orthogonality: 


Recurrence formula: 
Ay41(x“) — 22H, (x) + 2nHy,_1(x) = 0. 
Generating functions: 


e2tt— 7 


g(t, x) 


n=0 


Laguerre Polynomials L” (x) 


Orthogonality: 


672 D Tabulated List of Orthonormal Polynomial Functions 


Rodrigues formula: 


LY = ‘zs —x“2,yv+tn ; 
Differential equation: 
d? Vv d Vv Vv 
x— LY (a) + (vu +1—2)—L¥ (x) + nL (x) =0 
dx? dx 


Recurrence formula: 
(n+ 1)Ln4i(z) — (Qn +v+1)Ly (x) —(n+v)Ly_1(x) = 0. 


Generating functions: 


e~vt/(1-t) PS i iu 
g(t, x) ( ttt = S> Lh (a)t 
n=0 


Jacobi Polynomials Gye (x) 


Orthogonality: 


re (tetera) Ge (ade = Eto t Yn + vt) 
1 


Omn- 
_ niQQnt+utu+1)r(nt+v+pt+l) 


Rodrigues formula: 


_ (-1)” d” 


Gy (@) (l-a)""(1+2)-# 
2° n! 


[a-a)t™1 4+ a)et"]. 


Differential equation: 


a Wu, d wv 
(1-2?) SG Gn (@) + [m—v - (Yt H+ 22] — GR (2) 
XL x 


t+ n(n+u+ut+ yer (cz) =0. 
Recurrence formula: 


Gea) (x) =1, Gu (x) == 


slut et Det 0? -W?)}, 


Ant 1(ntv+pyt 1)(Qn4+v4 p)G:) (x) 


(Qn+uv+pt+1)[Qn+v+p)(Qn+v+ pt 2)x4 v? p?| GY (x) 
) 


An+v)(n+ p)Qn+y + p+ 2G (a 


D Tabulated List of Orthonormal Polynomial Functions 673 


Generating functions: 


our t 
t,x) = 2 
ae) (1—20t + 42)/? {1 04 (1 20t 2)" fa pet (1 —20t 22M 
= = GY aye. 
n=0 
Gegenbauer Polynomials C(x) 
Orthogonality: 
: wel Val'(n+ 2a)r (A- 5) 
1— 2)*~2 GA Ce dx = 2" Omn 
[ he CnC (a) ae al(n + NT (2A)T (A) 
Rodrigues formula: 
(-1)"P'(n+ 2d)I[A + 5] oad" = 
C — 2 1 2 At+ 3 1 nt+~r 
n(®) = omar A+ PQ) OO) dam [a -2”) 


Differential equation: 


2) ora) (24+ 1)eS On) + n(n + 24)CX2) = 0. 


Recurrence formula: 
(n+1)CA, (2) — 2(n + A)xCA(a) — (n+ 2A—-1)CA_, (x) =0. 


Generating functions: 


g(t,2) = ———__, -yo 


(1 ay 


Legendre Polynomials P,,() 


Orthogonality: 


2 
Px», x)dx = ——d mn. 
ye —— 2n+1 


Rodrigues formula: 


674 D Tabulated List of Orthonormal Polynomial Functions 


Differential equation: 


d? d 


enh) a 2s 


Recurrence formula: 
(n + 1)Pr4i(a) — (2n + 1I)aeP, (x) + nP,-1(x) = 0. 


Generating functions: 


g(t, 2) = 1/2 => Pale 


(1 — 2at + t?) 


Chebyshev Polynomials of the First Kind T,,(z) 


Orthogonality: 
x)T p(x) T 
dx = Onn 1+ bm bn : 
aes 5 On ( ono) 
Rodrigues formula: 
_ (=2)?al Var id 2\n—3 
Fla) = (2n)! hoe Br: la a) ‘| 
Differential equation: 
d? d 


(1— 2”) 


Th n 
dx? 7) dx - 


Recurrence formula: 
Tn41(2) — 2@T,(%) + Th-i(x) = 0. 


Generating functions: 


i 
t 2T,,(x)t™ + Ti 
ee aa icmee => Oe) 


Chebyshev Polynomials of the Second Kind U,(x) 


Orthogonality: 


a V1l-2« 2U (x x)dx = _ SS (1 = 5m0On0) : 


D Tabulated List of Orthonormal Polynomial Functions 


Rodrigues formula: 


Unley = = Oey ay [a-srt]. 
Differential equation: 
ql e) U, (x) 30 Uy(2) + n(n + 2)U,(x) = 0. 


Recurrence formula: 
Un4i(x) — 22U, (x) + Un—1(x) = 0. 


Generating functions: 


get) =F ape = tala 


675 


Index 


Abscissa of absolute convergence, 411, 
412 

Abscissa of convergence, 410-412, 415 

Absolute convergence, 32, 38, 40, 42, 
67, 214, 383, 411, 412, 424 

Absolute convergence of an infinite 
series, 33, 38, 40 

Absolute maximum theorem, 207 

Absolute minimum theorem, 208 

Absolutely divergent series, 42 

Accumulation point, 6, 657 

Active transformation, 570, 571 

Addition, 83, 640, 641, 643, 646 

Addition formula for analytic continua- 

tion, 254 

Addition identity, 83 

Addition of complex number, 74 

Addition of tensor, 586 

Addition of vector, 74 

Addition theorem of trigonometric 

function, 251 

Additive inverse, 83 

Adjoint operator, 502 

Admissibility condition, 450, 459 

Admissibility constant, 450 

Airfoil, 335, 336 

Aliasing, 394 

Almost everywhere, 156, 158, 160, 168, 
174, 456, 474 

Alternating sequence, 42 

Alternating series, 42 

Alternating series test, 42 

Amplitude modulation, 404 


Analytic continuation, 215, 216, 246, 
247, 249, 251-254, 284, 409, 420, 
421, 432 

Analytic continuations of each other, 
247, 248 

Analytic function, 102, 125, 186-188, 
191-194, 198, 201, 202, 204-208, 
210, 213, 217, 220, 236, 237, 244, 
251, 254, 259, 263, 286, 289, 290, 
305, 306, 409, 432, 436, 438 

Analyticity, 188, 190, 192, 193, 195, 
261, 312, 321, 412, 432 

Analyticity at infinity, 313 

Angle-preserving, 305, 306 

Angular momentum, 583, 591, 596 

Angular velocity, 596 

Anharmonic ratio, 322 

Antisymmetric part of tensor, 590 

Antisymmetric tensor, 589-591 

Approximation coefficient, 472, 478-480 

Area under the graph, 145, 382 

Associated Legendre function, 112 

Associative, 83, 387, 389, 646 

Associativity, 83 

Asymptotically stable critical point, 528 

Auto-correlation function, 390 

Autonomous system, 528 

Axial vector, 582 


Banach space, 86, 87 

Banach’s fixed point theorem, 173 
Basis, 79, 87, 88, 612, 642, 646, 649, 650 
Basis of a Hilbert space, 91 

Basis of tensor space, 647, 648 


678 Index 


Bernoulli equation, 505, 506 

Bernoulli’s theorem, 230, 232 

Bessel equation, 505 

Bessel function, 385, 403 

Bessel inequality, 79, 82, 89, 96, 104, 
357 

Beta function, 108 

Bilateral Laplace transform, 433 

Bilinear, 643, 651 

Bilinear function, 644, 645, 648, 651, 

652 

Bilinear transformation, 316, 317, 321, 

324, 325 

Binomial theorem, 24 

Bit reversal, 398 

Bit-reversing process, 399, 401 

B 

B 


asius’ formula, 230-232 
olzano-Weierstrass’ theorem, 26, 27, 

80, 657, 659 

Boundary point, 7 

Bounded above, 3, 20, 21, 36-38, 43 

Bounded almost everywhere, 160 

Bounded below, 3, 20, 21 

Bounded closed interval, 5 

Bounded open interval, 5 

Bounded real sequence, 19-21, 26, 27 

Bounded set, 3 

Branch, 226, 241-244, 252, 272, 273, 
276, 421 

Branch at infinity, 313 

Branch cut, 226, 248, 244, 441, 442 

Branch line, 244 

Branch point, 235, 243, 244, 252, 282, 
441 

Brownian motion, 550, 551 


Cantor set, 155, 157 

Cantor’s theorem, 658 

Cardinal number, 155 

Carrier wave, 404-406 

Cartesian basis, 79 

Cartesian coordinate, 317, 319 

Cartesian coordinate system, 567, 568, 

576, 580, 582, 602, 603 

Cartesian product of vector spaces, 643 

Cartesian space, 78 

Cartesian tensor, 578, 589, 596, 601 

Cartesian tensor of the first order, 576, 
577 


Cartesian tensor of the fourth order, 
585, 600 

Cartesian tensor of the second order, 
578 

Cartesian tensor of the third order, 585 

Cartesian vector, 576-578, 582 

Cauchy criterion, 25-27, 31, 36, 38, 53, 
55, 59, 62, 64, 79 

Cauchy criterion for convergence, 31 

Cauchy criterion for uniform conver- 
gence, 53 

Cauchy inequality, 208 

Cauchy principal value, 294 

Cauchy problem, 552, 553 

Cauchy sequence, 25-28, 36, 54, 55, 79, 
80, 82, 86, 91-94, 170, 171, 174, 
175, 496 

Cauchy’s integral formula, 124, 205-207, 
213, 217, 432 

Cauchy’s test for improper integrals, 
68, 425, 428 

Cauchy’s theorem, 198-201, 205, 210, 
220, 262, 274, 291 

Cauchy-Riemann relation, 189, 194, 
308, 309, 332 

Causality requirement, 300 

Center, 533, 535 

Central limit theorem, 175-178, 180 

Characteristic curve of a PDE, 542 

Characteristic equation of a PDE, 542 

Characteristic function, 176, 178, 180 

Characteristic polynomial, 529 

Chebyshev polynomial of the First 
Kind, 674 

Chebyshev polynomial of the first kind, 
119, 129-131, 133, 185 

Chebyshev polynomial of the Second 
Kind, 674 

Chebyshev polynomial of the second 
kind, 119 

Chebyshev’s inequality, 157, 158 

Christoffel symbol, 621-625, 627, 628, 
630-632, 635 

Christoffel symbol of the first kind, 623 

Christoffel symbol of the second kind, 
623 

Circle of convergence, 214-217, 221, 
222, 246, 250, 253, 254, 256 

Circulation (of fluid flow), 229 


Clairaut equation, 488 

Closed set, 7 

Closedness, 83, 430 

Closure, 5, 7 

Cluster point, 6, 657 

Cofactor, 524, 572-575 

Column-vector notation, 510 

Commutative, 83, 387, 389, 640, 646 

Complement, 2 

Complementary minor of an element of 

a matrix, 571 

Complementary set, 2, 7, 8, 149, 150, 

156, 157 

Complementary system, 514 

Complete, 73, 79, 80, 86-89, 91, 101, 

170, 173, 515 

Complete analytic function, 248 

Complete integral of an ODE, 487 

Complete orthonormal set of functions, 

73, 97, 98, 109, 463, 464 

Complete orthonormal set of poly- 

nomials, 101, 103-105, 117, 

119 

Complete orthonormal vectors, 89 

Completeness of wavelets, 462, 463 

Complex conjugate, 75, 533, 641 

Complex function, 185, 186 

Complex sphere, 311 

Complex vector space, 74—76, 83 

Component of a tensor, 565, 576, 578 

Conditional convergence, 32-35, 37, 42 

Conditional convergence of an improper 
integral, 67, 412 

Conditional convergence of an infinite 
series, 33 

Conformal, 305 

Conformal mapping, 306-308, 310-313, 
315-317, 321, 322, 324, 328, 330, 
331, 333-335 

Conjugate harmonic function, 192 

Conjugate linear, 75 

Conjugate space, 641 

Conservation law of current flow, 537 

Conservation law of momentum, 230, 
231, 373 

Conservation of a functional equation, 
250, 253 

Contact point, 5-9 

Continuity of complex function, 186 


Index 679 


Continuity theorem (for characteristic 
functions), 180 

Continuity theorem (of integrals), 67 

Continuous approximation, 472, 476 

Continuous function, 47, 50 

Continuous on the left, 48 

Continuous on the right, 48 

Contraction, 587-589, 591, 596, 598, 
618 

Contraction mapping, 173, 174 

Contraction mapping theorem, 173-175 

Contrapositive proof, 9 

Contravariant basis vector, 603, 604, 
606, 621 

Contravariant component of a tensor, 
608-610, 612, 613, 618, 619, 629 

Contravariant component of a vector, 
607, 611, 613, 617, 627, 628, 631, 
650, 653 

Contravariant degree of tensor, 645 

Contravariant local basis vector, 603 

Contravariant tensor, 646 

Contravariant vector, 646, 647 

Convergece test for alternating series, 
42 

Convergence almost everywhere, 156, 
171 

Convergence of a real sequence, 17 

Convergence of a sequence of vectors, 
80 

Convergence of an improper integral, 67 

Convergence of an infinite series, 30, 33 

Convergence of Laplace integral, 
408-411, 422, 424-427, 430-432, 
435 

Convergence test, 29, 38, 42 

Convolution, 387-389, 451, 453, 459, 
479 

Convolution integral, 447, 476 

Convolution theorem, 387 

Coordinate, 88, 567 

Coordinate axis, 567 

Coordinate transformation, 566, 567, 
570, 577, 580 

Corner, 50 

Correlation function, 388 

Countable set, 154, 155 

Covariant basis vector, 603-605, 608, 
613, 617, 619, 621 


680 Index 


Covariant component of a tensor, 609, 
610, 613, 619, 632 

Covariant component of a vector, 607, 
608, 611, 613, 617, 627, 628, 631, 
635, 650 

Covariant constant, 633 

Covariant degree of tensor, 645 

Covariant derivative, 628-633 

Covariant differentiation, 634, 635 

Covariant local basis vector, 602, 603 

Covariant tensor, 646 

Covariant vector, 608, 646, 647 

Critical point of an autonomous system, 
527-534 

Critical point of conformal mapping, 
308, 310 

Cross ratio, 322, 325 

Cross-correlation function, 388-390 

Curvature tensor, 635 

Curvilinear coordinate system, 565, 
601-603, 605, 607, 611, 615, 617, 
621 

Cut, 244 

Cylindrical coordinate system, 612, 616, 
621, 624 


D’Alemberian, 552 

Damped harmonic oscillator, 383 

Damping time constant, 446 

Darboux’s inequality, 196, 209, 211 

Decomposition algorithm, 478, 479 

Decreasing sequence, 20 

Derivative (of a complex function), 186, 
187 

Derivative (of a real function), 48 

Determinant of a matrix, 571 

Difference, 2 

Differentiability (of a complex function), 
186, 188 

Differentiability (of a real function), 48 

Diffusion constant, 551 

Diffusion equation, 371, 545, 550, 551, 
561, 562 

Diffusion operator, 546, 550 

Dilatation parameter, 451, 454 

Dilation equation, 468 

Dimension of a vector space, 88 

Dirac’s 6-function, 661-663, 667 


Direct product (of vector spaces), 643, 
645, 646 

Direct product (of vectors), 578 

Direct proof, 9 

Direct sum of vector spaces, 643 

Directed cosine, 568 

Direction field, 490, 494, 526 

Dirichlet boundary condition, 332-334, 

540, 556 

Dirichlet problem for the diffusion 

equation, 551 

Dirichlet problems for the Laplace 

equation, 548 

Dirichlet theorem, 360 

Dirichlet’s conditions for the Fourier 

series convergence, 341, 347 

Dirichlet’s function, 149, 155, 156, 172 

Dirichlet’s integral, 353, 358 

Dirichlet’s kernel, 354 

Dirichlet’s theorem, 341 

Discrete Fourier transform, 391-394, 
396, 398, 400, 401 

Discrete wavelet, 460, 462, 463 

Discrete wavelet transform, 460-462, 
467, 472, 476, 478 

Disjoint interval, 141, 146 

Disjoint set, 2, 144, 145 

Dispersion relation, 297-302 

Displacement vector, 600 

Distance, 84, 174, 639 

Distance function, 84, 85 

Distribution, 176-180 

Distributive, 83, 387, 389, 646 

Divergence (as a vector operation), 631 

Divergent sequence, 18, 19 

Divergent series, 32, 33, 35, 42 

Divergent test, 32 

Dominated convergence theorem, 158, 
160, 161, 165, 166 

Dual basis, 642, 652 

Dual space, 641, 642, 645, 646 

Dummy index, 566, 626, 628, 629 

Dyadic grid arrangement, 461 

Dyadic grid wavelet, 461 


HKigenenergy, 136 

HKigenfrequency, 557 

Eigenfunction, 136, 501, 504, 505, 557 
Eigenvalue, 501, 504, 505, 5380-534, 544 


genvector, 530-534 

nstein tensor, 636 

nstein’s field equation, 635-637 

asticity theory, 600 

astisity theory, 585 

ectric conductivity, 598 

ectromagnetic field, 599 

ement, 1—5, 7, 74, 76, 83 

liptic class of PDEs, 544 

liptic coordinate, 319 

liptic coordinate system, 565 

liptic integral of the first kind, 326 

Empty set, 1, 141, 142 

Entire function, 191, 209, 313 

Enumerable, 154 

Equal, 2 

Equality almost everywhere, 156, 158, 
174, 175, 456, 474 

Equivalent, 10 

Essential singularity, 233, 235-240, 282 

Essential singularity at infinity, 313 

Euclidean space, 3, 74, 75, 515, 614, 
639, 640, 651 

Euler’s formula, 108 

Euler-Fourier formula, 340, 344 

Existence theorem, 491, 495, 498, 515 

Expected value of a random variable, 
143, 176 

Explicit solution of an ODE, 484 

Exponential order, 423-427, 431 

Extended definition of conformal 
mappings, 312 

Extended real number, 3 


False, 9 

Fast Fourier transform, 396, 397 

Fast Fourier transform (FFT), 396, 398, 
399, 401 

Fast orthogonal wave transform, 478 

Fast wavelet transform, 460, 477-480 

Father wavelet, 463, 470, 477 

Fejér’s integral, 353, 358 

Fejér’s theorem, 355, 360, 361 

Finite set, 1, 140, 154, 155, 657 

First shifting theorem, 415 

First-order Cartesian pseudotensor, 
582, 585 

First-order linear homogeneous ODE, 
484 


Index 681 


Fisrt-order linear homogeneous PDE, 
541 

Fixed point in L?, 174 

Flat Riemann space, 614 

Flat space, 634-636 

Fluid flow, 229 

Focus, 533 

Four potential, 599 

Four-current density, 600 

Four-vector, 76 

Four-velocity, 636 

Fourier coefficient, 95-98, 105 

Fourier cosine series, 344, 345, 350 

Fourier integral, 383 

Fourier integral representation, 378 

Fourier integral theorem, 379, 380 

Fourier series, 95, 96, 339-341, 360, 363, 
366, 377 

Fourier sine series, 344, 350 

Fourier transform, 378, 382, 390, 391, 
406, 435, 559 

Fourier transform in three dimension, 
384 

Fourier transform in two dimension, 385 

Fractional transformation, 316 

Fraunhofer diffraction, 401, 403 

Frequency modulation, 404 

Fresnel cosine integral, 279, 281 

Fresnel sine integral, 279, 281 

Fubini’s theorem, 162-164, 166, 167, 
173, 175, 176, 178, 180 

Fubini-Hobson-Tonelli theorem, 164 

Function element, 247 

Function of exponential order, 423-427, 
431 

Function space, 172, 173 

Fundamental matrix, 518, 519, 521 

Fundamental mixed tensor, 611 

Fundamental sequence, 25 

Fundamental system of solutions, 
516-523 

Fundamental tensor, 585 


G.Lb., 4 

Gamma function, 108, 254 

Gauss notation, 106 

Gegenbauer polynomial, 119, 673 
General analytic function, 248 
General relativity theory, 634, 636 


682 Index 


General solution of a differential 
equation, 316, 372, 375, 487-489, 
522, 530, 531, 533, 534, 540, 542, 
545, 553, 554, 557, 559 

Generalized Fourier coefficient, 91, 95 

Generalized Fourier series, 95 

Generating function, 113, 114, 124-126, 
470 

Generating function of Chebyshev 
polynomials of the first kind, 674 

Generating function of Chebyshev 
polynomials of the second kind, 
675 

Generating function of Gegenbauer 
polynomials, 673 

Generating function of Hermite 
polynomials, 125, 671 

Generating function of Jacobi polyno- 
mials, 673 

Generating function of Laguerre 
polynomials, 125, 672 

Generating function of Legendre 
polynomials, 113, 114, 125, 674 

Generating function of the multiresolu- 
tion analysis, 470 

Geometric curvature, 634 

Gibbs phenomenon, 347, 365, 366 

Goursat’s formula, 206-208, 261, 265 

Gradient, 631 

Gradient of a scalar, 631 

Gradient of a vector, 580 

Gram-Schmidt orthogonalization 
method, 103, 105, 114, 505 

Greatest lower bound, 4, 411 

Green’s function, 558, 559 

Gutzmer’s theorem, 227 


Haar discrete wavelet, 462, 472, 473 

Haar wavelet, 450, 458, 467, 473 

Half-range Fourier series, 344, 347 

Harmonic function, 191, 195, 546, 549, 
550 

Harmonic series, 35, 36, 39 

Heat flow, 550 

Heat flux, 561 

Hermite equation, 502 

Hermite polynomial, 117, 125, 127, 135, 
671 

Hermitian operator, 503 


Hilbert space, 73, 74, 79-83, 87-92, 95, 
98, 352 

Hilbert space theory, 352 

Hilbert transforms pair, 295-298 

Holomorphic, 188 

Hooke’s law, 600 

Hyperbolic class of PDEs, 544, 546, 
552, 553 

Hyperharmonic series, 36 


Identically distributed, 176, 177, 179 

Identity vector, 74 

If and only if, 10 

Imaginary part of a complex function, 
185 

Implicit solution of an ODE, 485, 486, 
490 

Improper integral, 66-68, 302, 412, 420, 
422, 425-428, 430, 433 

Improper node, 530, 531 

Improper rotation, 580-585 

Incomplete inner product space, 81 

Incompressible, 228 

Increasing sequence, 19 

Independent random variable, 176, 177 

Index lowering, 652 

Index raising, 652 

Inertia tensor, 596-598 

Infimum, 4, 8, 140 

Infinite series, 29-33, 37, 38, 40, 42, 96, 
105, 109, 221, 339 

Infinite series of functions, 62-64, 227, 
281, 340, 342, 362, 496 

Infinite set, 1, 154, 155, 157, 657 

Initial value problem, 419, 491-493, 
495, 497-499, 510, 513, 527, 552, 
554, 555 

Inner measure, 150, 157 

Inner product, 73, 75, 76, 78, 80, 87, 89, 
96, 97, 352, 588, 639, 641, 651 

Inner product (in tensor calculus), 651, 
652 

Inner product notation, 502, 503 

Inner product space, 75-82, 86, 87, 640, 
651 

Integral curve, 488-490 

Integral equation, 492 

Integral function, 313 

Integral of PDE, 540 


Interior point, 7 

Intersection, 2, 87, 247, 658, 660 

Interval, 4 

Invariant, 609 

Invariant tensor, 585 

Inverse Fourier transform, 378, 379, 
384, 387, 395, 396, 406, 471 

Inverse Fourier transformation, 435 

Inverse Laplace transform, 408, 409, 
432, 434, 436, 439-441, 444, 446, 
448, 558 

Inverse matrix, 523, 574, 620, 653 

Inverse of discrete Fourier transform, 
392, 393 

Inverse of the two-sided Laplace 
transform, 434, 435 

Inverse wavelet transform, 456-458, 460 

Inversion (as a bilinear transformation), 
321, 327-329 

Inversion (as an improper rotation), 
581, 582 

Trrotational, 228, 231 

Isolated point, 6-8, 95, 149, 212, 657 

Isolated singularity, 233-236, 239, 252, 
262, 263, 313 

Isomorphism, 98, 649 

Isomorphism between £? and L?, 98, 99 

Isotropic tensor, 585, 600 


Jacobi matrix, 122 

Jacobi polynomial, 118, 672 

Jacobian determinant, 309, 388, 555, 
617 

Jordan’s lemma, 270, 438, 441 

Joukowsky airfoil, 336 

Joukowsky transformation, 335, 336 


Kinetic energy, 597 

Kramers-Kronig relations, 299 
Kronecker’s delta, 78, 610 
Kutta-Joukowski’s theorem, 228-231 


L’Ho6pital’s rule, 12, 13, 239, 280, 282, 
370, 417, 423 

L.u.b., 3 

Laguerre polynomial, 117, 118, 125, 671 

Lamé constants, 600 

Langevin’s function, 283 

Lapalce transform, 408 


Index 683 


Laplace equation, 191, 192, 228, 
331-334, 545, 546, 548-550 

Laplace integral, 408, 410, 411, 421, 
422, 428, 429, 432, 433 

Laplace operator, 546 

Laplace transform, 407—409, 412, 
414-418, 420, 422, 432, 444-447, 
558, 559 

Laplace transform of derivative, 419 

Laplace transform of integral, 420 

Laplacian, 545, 546, 548-550, 559, 630, 
631 

Laurent series expansion, 219-224, 226, 
233-235, 238, 239, 260, 265, 266 

Least upper bound, 3, 658 

Lebesgue convergence theorem, 173, 
175, 178, 180, 181 

Lebesgue integrable, 152, 153, 161, 173, 
174 

Lebesgue integral, 139, 141, 144, 147, 
149, 151-155, 158, 161, 162, 167, 
172, 175, 176 

Lebesgue measurable function, 167, 170 

Lebesgue measurable set, 157 

Lebesgue measure, 149-151, 154-156, 
176 

Lebesgue sum, 151-153 

Left-hand limit, 46, 48, 408 

Left-handed coordinate system, 568, 
582 

Legendre polynomial, 105-107, 109, 
112-114, 119, 125, 127, 136, 137, 
673 

Legendre’s equation, 501 

Levi-Civita symbol, 584, 588 

Lift, 336 

Lift force, 228 

Limit, 18 

Limit cycle, 536, 538 

Limit inferior, 21, 22, 140 

Limit of a function, 45 

Limit point, 6, 18, 657 

Limit superior, 21, 22, 40-42, 140 

Limit test for convergence, 38, 39, 43 

Limit test for divergence, 39, 40, 43 

Line element, 490 

Linear autonomous system, 528 

Linear differential equations, 407 

Linear function, 640 


684 Index 


n-linear function, 644 

Linear homogeneous ODE, 484, 500 

Linear homogeneous PDE, 541 

Linear homogeneous system of ODEs, 
514, 516, 517, 524, 528 

Linear independence, 76, 79, 82, 88, 
103, 375, 515-517, 519, 520, 522, 
523, 569, 642 

Linear inhomogeneous ODE, 505, 506 

Linear inhomogeneous PDE, 541 

Linear inhomogeneous system of ODEs, 
514, 516 

Linear mapping (of vector spaces), 640, 
650, 651 

Linear ODE, 484 

Linear partial differential equation 
(PDE), 540 

Linear space, 640 

Linear transformation, 315, 316, 514, 
548, 544 

Liouville’s formula, 518, 521, 524 

Liouville’s theorem, 209, 238, 313 

Lipschitz condition, 174, 495, 497, 500, 
512, 515, 526 

Lipschitz constant, 495, 512 

Local basis vectors, 602 

Localization theorem, 368 

Logarithmic residue, 286, 287 

Logistic equation, 506 

Lower bound, 3 

Lower limit, 21 

Lower Riemann-Darboux integral, 140 


Mobius transformation, 316, 322 
Magnetic susceptibility, 598 

Maximum, 4 

Maxwell equation, 599, 637 
Maxwell-Boltzmann distribution, 177 
Mean convergence, 95, 97, 105, 351-353, 
355-357, 360, 361 

Mean value of a random variable, 148, 
144, 176 

Mean value theorem, 58, 204, 560, 562, 
662 

Measurable set, 151, 157, 160, 164, 165, 
167 

Measure, 141 

a-measure, 141, 144, 146-148 

Message wave, 404-406 


Method of inversion, 327 

Method of variation of constant 
parameters, 522 

Metric coefficient, 616 

Metric space, 84, 85 

Metric tensor, 611-614, 617-619, 621, 
623, 624, 626, 630, 631, 633-635, 
637 

Metric vector space, 84 

Mexican hat wavelet, 450-452, 455, 467 
Minimax property, 129 

Minimum, 4 

Minkowski’s inequality, 93, 169-171, 
175 

Minor of elements of a matrix, 571 
Mixed component of a tensor, 609-612, 
618 

Modified summation convention, 603, 
605, 606 

Moment of inertia, 597, 609 
Monotone convergence theorem, 
158-161, 165, 166, 171 
Monotonic sequence, 20 
Monotonically decreasing sequence, 20 
Monotonically increasing sequence, 19 
Morera’s theorem, 210 

Mother wavelet, 467-470, 477 
Multilinear function, 644-646 
Multiplication of complex number, 74 
Multiply connected region, 200 
Multipole, 136 

Multiresolution algorithm, 478 
Multiresolution analysis, 463, 464, 
466-470, 474, 477 
Multiresolution analysis equation, 468 
Multiresolution representation, 472 
Multivalued function, 226, 240-244, 
252, 318, 409, 420, 441 


Natural boundary, 249, 253, 256 
Natural isomorphism, 649-651 
Natural pairing, 643 

Necessary and sufficient condition, 10 
Necessary condition, 9 

Neighborhood, 5-8, 10, 12, 18, 24, 66, 
67, 187, 188, 218, 233-236, 239, 
244, 250, 260, 286, 308, 312, 313, 
316, 369, 528, 657, 660 


Neumann boundary condition, 332, 540, 
556 

Neutrally stable critical point, 528 
Newtonian field of gravity, 636 

Noise reduction, 457, 458 
Non-degenerate, 651 

Non-isolated singularity, 235 
Nonhomogeneous linear partial 
differential equation, 541 
Nonlinear differential equation, 484, 
505, 506, 522, 538, 637 
Nonoverlapping sets, 141 

Norm, 76, 80, 85-87, 89, 91, 94-96, 174, 
352, 473, 510, 639 

p-norm, 85-87, 168, 175 

Normal distribution, 177-180 

Normed space, 85-87 

Null measure, 155, 157 

Nyquist critical frequency, 393 


Once-subtracted dispersion relation, 
300 

One-sided derivative, 49 

One-sided limit, 46 

Open set, 7 

Order of differential equation, 483 

Order of zero of function, 233 

Ordinary differential equation (ODE), 
174, 483 

Orthogonal basis, 88 

Orthogonal complement, 465 

Orthogonal curvilinear coordinate, 317 

Orthogonal decomposition, 466 

Orthogonal polynomial, 114-117, 119, 
121-124, 126, 128, 129 

Orthogonal relation, 579 

Orthogonality, 73, 78, 79, 82, 88-90, 
103, 105, 107, 127 

Orthogonality relation, 115, 119, 120, 
129, 133 

Orthonormal basis, 73, 78, 88, 463, 464, 
466-471 

Orthonormality, 78, 101 

Orthonormality of wavelets, 462, 463 

Outer measure, 150, 156 

Outer product, 578, 588, 595, 610 


Parabolic class of PDEs, 544 
Parallelogram law, 78, 87 


Index 685 


Parseval’s identity, 90, 97, 98, 104, 302, 
356, 357, 362, 383, 390 

Parseval’s identity (for wavelet 
transform), 457, 460, 474 

Partial differential equation (PDE), 
371, 539 

Partial sum, 30, 31, 35-38, 43, 64, 99, 
104, 109, 256, 345, 346, 353, 358, 
363, 365, 366, 368, 369, 496, 500 

Particular solution of differential 
equation, 487, 506, 522, 523, 553 

Partition, 140, 152 

Passive transformation, 570, 571 

Path independence, 198 

Permutation symbol, 584 

Phase space, 527, 538 

Picard’s method, 491, 492, 497, 499 

Piecewise continuous function, 48, 50, 
362-364, 385, 417 

Piecewise smooth function, 50, 360, 
363, 364, 379, 380, 386 

Plancherel’s identity, 390 

Point, 1 

Point at infinity, 237, 238, 244, 312, 
313, 320, 322, 329 

Point of equilibrium of an autonomous 
system, 527 

Pointwise convergence, 51, 52, 54, 60, 
62, 95, 158, 173, 360, 363, 364 

Pointwise limit, 51 

Poisson’s equation, 547, 636 

Poisson’s integral formula, 213 

Polar coordinate system, 108, 194, 244, 
280, 310, 315, 317-319, 323, 403, 
549, 565 

Polar vector, 582 

Pole, 233-238, 240 

Pole at infinity, 313 

Positive definite, 651 

Potential field, 114, 1386, 137, 228, 229, 
232, 333-335, 583, 599, 636 

Power spectrum, 383, 390, 405, 406 

Pre-Hilbert space, 86, 87 

Primitive integral of an ODE, 487 

Principal part in the Laurent series, 
222, 234, 235, 238 

Principal value integral, 68, 206, 
293-296, 300 

Probability, 143, 176, 177 


686 Index 


Probability density, 136, 176, 177 

Probability density function, 143 

Probability distribution function, 148, 
144 

Product of inertia, 597 

Proof by contradiction, 9 

Proper node, 532, 533 

Proper rotation, 581, 582, 584, 585 

Proper subset, 2 

Pseudotensor, 580, 582 

Pseudovector, 582, 583, 590 

Pyramid algorithm, 478 

Pythagorean formula, 73 


Quantum mechanics theory, 135 
Quotient law, 592 


Radius of convergence, 102, 214-219, 
223, 245, 246, 252, 256, 257 
Random variable, 143, 144, 176, 177, 
179, 180 

Range convention, 566, 568 

Ratio method, 263, 266, 267, 278 

Ratio test for convergence, 40, 44 

Rational function, 132, 237, 238, 267, 
268, 271, 273 

Real part of a complex function, 185 

Real vector space, 83 

Reality condition, 298 

Rearrangement, 34, 35, 38 

Reconstruction algorithm, 478-480 

Rectangular Cartesian coordinate 
system, 565, 567, 570, 606 

Recurrence formula for orthogonal 
polynomials, 119-121, 125-127, 
129, 671-675 

Recurrence relation (for analytic 
continuation), 254 

Recurrence relation (of gamma 
functions), 255 

Recurrence relation (of scaling 
functions), 468 

Reduced system, 514 

Refinement equation, 468 

Reflection, 581-583 

Region of analyticity, 206-208, 250 

Region of convergence of the Laplace 
integral, 410, 425, 427, 430-435, 
444 


Region of the existence, 249 

Regular, 188 

Regular analytic, 188 

Regular part in the Laurent series, 222 

Regular point, 527 

Removable singularity, 233, 234, 236 

Residue, 259, 260, 263-268, 272-274, 
281, 282, 285, 286 

Residue theorem, 259, 260, 263, 267, 
269, 274, 277-279, 281, 286, 437 

Riccati equation, 506 

Ricci curvature, 637 

Ricci scalar, 636 

Ricci tensor, 636 

Ricci’s theorem, 630, 633, 634 

Riemann integrable, 140 

Riemann integrable function, 158, 172 

Riemann integral, 139, 140, 144, 152, 
153, 158, 172, 175 

Riemann space, 614 

Riemann sphere, 311-313 

Riemann sum, 144, 153 

Riemann surface, 241-243, 421, 422, 
441 

Riemann tensor, 635, 636 

Riemann’s theorem, 35 

Riemann-Darboux integral, 140 

Riemann-Lebesgue theorem, 358, 364, 
369 

Riemann-Stieltjes integral, 144 

Riemann-zeta function, 284 

Riesz-Fisher’s theorem, 96, 98 

Right-hand limit, 46, 48, 408 

Right-handed coordinate system, 567 

Rigid rotation of coordinate axes, 
567-570, 575, 576, 580, 583, 585 

Rodrigues formula, 106, 107, 114-119, 
121, 123, 124, 127, 129, 671-675 

Root test for convergence, 41, 44, 253 

Rotation (as a vector operation), 632 

Rotation (of fluid flow), 229 

Rotation with reflection, 581 

Rouché’s theorem, 210, 290-292 


Saddle point, 531, 532 
Sampling theorem, 394 
Scalar, 83, 579, 609, 646 
Scalar curvature, 636, 637 


Scalar multiplication, 74, 83, 640, 641, 
643 

Scalar product, 75, 579, 604, 607, 611, 
613, 617, 618 

Scale factor, 308, 310, 321, 616, 617, 634 

Scale-dependent thresholding, 458 

Scaling function, 463, 464, 467-471, 
A474, 476, 479 

Scaling function coefficient, 468-470, 
ATA, 477 

Scaling function space, 467 

Schrodinger equation, 136 

Schwarz differential equation, 316 

Schwarz Lemma, 211 

Schwarz principle of reflection, 254, 257 

Schwarz’s inequality, 76, 77, 89 

Schwarz-Christoffel transformation, 
325-327, 329, 332, 333 

Second shifting theorem, 416 

Second-order Cartesian pseudotensor, 
584 

Second-order linear homogeneous PDE, 
543 

Secular equation, 529 

Self-adjoint operator, 503 

Semiclosed interval, 5 

Sequence of partial sum, 30, 31, 43, 104, 
109 

Sequence of the remainder, 30 

Set, 1 

Shuffled sequence, 28, 37 

Signal approximation, 472 

Signal detail, 472 

Simple Laplace development, 572 

Simple set, 146 

Simple statement, 9 

Simply connected region, 199 

Single-valued function, 84, 240, 241, 
243, 247, 248, 288-290, 408, 414, 
421, 484, 510, 614 

Singular line, 253 

Singular point of an autonomous 
system, 527 

Singular solution of ODE, 488, 489 

Singularity, 188, 233 

Skew-symmetric, 589 

Smooth function, 50 

Solution of ODE, 484 

Solution of PDE, 540 


Index 687 

L? space, 81, 87, 92, 95, 98, 99, 352 

L? space, 86, 87 

@ space, 80, 87, 91, 92, 95, 96, 98, 99 

£? space, 86, 87 

Specific heat, 562 

Spectrum of Sturm-Liouville system, 
501 

Spherical coordinate system, 384, 549, 
550, 616 

Spherical harmonic function, 109, 111, 
112 

Spiral point, 533, 534 

Square-integrable function, 81, 92, 94, 
95, 98, 101, 352, 357, 358 

Stability of critical point, 527, 528 

Stable critical point, 528 

Step function, 81, 144-148, 365, 366, 
416, 417, 436, 440, 445 

Strain tensor, 600 

Stream function, 228 

Stress tensor, 600 

Strictly decreasing sequence, 20 

Strictly increasing sequence, 19 

Strictly stable critical point, 528 

Sturm-Liouville equation, 500-502, 505 

Sturm-Liouville operator, 500, 503, 504 

Sturm-Liouville system, 501, 504 

Subinterval, 5 

Subset, 1 

Subtraction of tensor, 586 

Successive approximation, 492, 493, 
495, 499 

Sufficient condition, 9 

Sum of infinite series, 30 

Sum of infinite series of functions, 62 

a-summable, 146, 147 

Summation convention, 566, 568, 602 

Support, 144-146 

Supremum, 3, 4, 8, 11, 140 

Symmetric Cartesian tensor of the 
second order , 597 

Symmetric part of tensor, 590 

Symmetric tensor, 589 


Taylor series expansion, 49, 102, 212, 
217-219, 222-225, 239, 246, 254, 
260, 263, 265, 266, 284, 294, 500, 
528 

Tensor, 565, 645 


688 Index 


Tensor of the first order, 609 

Tensor of the second order, 609 

Tensor of zero order, 579, 609 

Tensor product, 644-647 

Tensor space, 646-648, 650 

Thermal conductivity, 372, 561 

Total variation of argument, 288-290 

Trajectory, 526-528, 530 

Transfinite number, 155 

Translation parameter, 451, 453 

Tree algorithm, 478 

Triangle inequality, 77, 92, 169 

Trigonometric Fourier series, 109, 339, 
340 

Trigonometric series, 339, 340, 355, 360 

True, 9 

Tunnel diode, 536 

Two-scale relation, 468, 469, 477, 478 

Two-sided Laplace integral, 433, 434 

Two-sided Laplace transform, 433-435, 
444 


nbounded open interval, 5 

nbounded set, 3 

niform convergence (of complex- 
function sequence), 213, 214, 217, 
227 

Uniform convergence (of Fourier series), 

341, 342, 352, 355, 359-362, 366 

Uniform convergence (of improper 

integral), 67, 68, 380 

Uniform convergence (of Laplace 

integrals), 425-427, 429-432 

Uniform convergence (of polynomial 

sequence), 101, 102, 104 

Uniform convergence (of real-function 
sequence), 52-68, 95, 158, 172, 497 

Union, 2 

Uniqueness of the integral, 198 

Uniqueness theorem (for analytic 
continuation), 250, 252, 254 

Uniqueness theorem (for characteristic 

function), 176, 179 

Uniqueness theorem (for solution of 

ODE), 136, 491, 497, 498, 515, 

516, 526 

Uniqueness theorem of the Dirich- 

let problem (for the diffusion 

equation), 552 


Ceres 


Uniqueness theorem of the Dirich- 
let problem (for the Laplace 
equation), 548 

nit scalar, 83 

nitary space, 75 

niversal gravitational constant, 636 

niversal set, 2 

nstable critical point, 528 

pper bound, 3 

pper limit, 21 

pper Riemann-Darboux integral, 140 


Gc Eve.cr a aie 


Van der Pol equation, 538 
Vanishing order, 11 
Variation of argument, 288 
Vector, 74 

Vector space, 73, 74, 83, 640 
Velocity potential, 228 


Wave equation, 373, 545, 552, 555, 556, 
558, 559 

Wave operator, 546, 552, 553 

Wavelet, 449 

Wavelet analysis, 449 

Wavelet coefficient, 469, 472, 477, 478 

Wavelet space, 467 

Wavelet transform, 451-456, 458, 460 

Weierstrass approximation theorem, 
101 

Weierstrass’ M test, 63 

Weierstrass’ test for improper integral, 
68, 426 

Weight function, 75, 115, 452, 500, 502, 
505 

Wiener-Kinchin’s theorem, 389, 390 

Wigner-Seitz cell, 348 

Winding number, 262, 263, 291 

Wronskian, 518, 521 

Wronsky determinant, 518, 521 


Zero of function, 105, 122, 129, 130, 
132, 133, 207, 209, 210, 212, 264, 
286, 289, 367, 404 

Zero scalar, 83 

Zero vector, 78, 83, 89, 90 

Zeros of function, 264 

Zeta function, 36 


